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Chapter  1 


Introduction 


If  you  can  look  into  the  seeds  of  time, 

And  say  which  grain  will  grow  and  which  will  not, 

Speak. 

—  William  Shakespeare,  Macbeth,  Act  I,  Scene  iii,  line  58, 

This  thesis  describes  a  system  I  designed  and  implemented  to  allow  programs  written 
in  the  dataflow  language  Id  to  run  on  the  J-Machine,  a  massively-parallel  general-purpose 
computer.  The  system  is  functional  and  includes: 

•  A  compiler  that  recognizes  a  significant  portion  of  Id  and  produces  J-Machine  assembly 
code. 

•  Library  routines  to  provide  operating  system  functions,  fault  handlers,  and  language- 
specific  featiixes  like  I- structure  storage. 

•  A  strategy  for  aggressive  loop  parallelization. 

I  do  not  directly  address  the  question  of  how  to  sequentialize  portions  of  dataflow  graphs.  For 
this,  I  took  advantage  of  the  work  done  by  Ken  Traub  on  program  partitioning  [Traub  1988] 
and  Robert  lannucci  for  his  “dataflow  /  von  Neumann  hybrid”  architecture  and  compiler 
[lannucci  1988].  With  some  optimizations,  my  system  simulates  lannucci’s  hybrid  architec¬ 
ture  on  the  J-Machine.  In  this  doctunent,  I  describe  aind  justify  my  approach,  detail  my 
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tTaxLsformations,  analyze  the  results,  and  present  my  conclusions  about  the  project  and  fut\ire 
research  on  dat^lflow  computation  for  the  J-Machine. 


1.1  Background 

A  Wge  amount  of  research  hais  gone  into  developing  amd  Implementing  the  dataflow  model 
of  parallel  computation.  In  order  to  exploit  the  parallelism  revealed  by  dataflow  techniques, 
special-purpose  dataflow  machines  have  been  built  that  are  unlike  traditional  von  Neumann 
processors,  using  parallel  machine  languages  and  having  token  and  I-structure  memory.  Be¬ 
ef  use  individual  instructions  are  scheduled  dynamically  on  dataflow  processors,  this  leads 
to  \innecessarily  high  run-time  overhead.  On  the  other  hand,  dataflow  architectures,  with 
their  per  instruction  synchronization,  are  more  tolerant  than  von  Neumainn  machines  at  tol¬ 
erating  latency:  If  the  data  dependences  allow  some  computation  to  be  performed  while  the 
previously-executing  task  is  wanting  for  data,  the  processor  will  be  kept  busy.  The  motivation 
for  a  hybrid  architecture  is  to  combine  the  latency  toleration  of  a  dataflow  processor  with  the 
efficiency  of  a  von  Neumann  processor.  Often,  enough  is  known  at  compile-time  to  specify  a 
full  ordering  of  a  set  of  instructions,  reducing  the  aimount  of  run-time  scheduling  necessary. 
Hybrid  architectures  attempt  to  take  advantage  of  this  knowledge  by  delineating  sequences  of 
instructions  whose  order  cam  be  pre-determined,  combining  the  exposed  parallelism  of  dataflow 
with  the  efficiency  of  von  Neumann  computation.^ 

While  combining  instructions  into  sequentiad  threads  theoretically  lessens  the  amoxmt  of 
run-time  pairaiUelism  avaiilable,  it  can  be  more  practical  in  that  it  minimizes  scheduling  over¬ 
head  aind  allows  the  code  to  run  on  computers  not  dedicated  to  dataflow  processing.  Ad- 
ditionadly,  even  dataflow  computers  do  not  attempt  to  exploit  the  maximum  possible  par- 
adlelism.  For  exaunple,  on  Monsoon,  a  specific  invocation  of  a  procedure  is  generally  not 
divided  aunong  processors  but  takes  place  on  a  single  one.  Instead,  the  pairallelism  comes 
from  pipelining  amd  from  running  iterations  of  one  loop  concurrently  on  separate  processors 
[Papadopoulos  and  Culler  1990],  a  feature  that  retained  by  hybrid  airchitectures.  In  order 
to  ensure  that  grouping  instructions  into  threads  does  not  lessen  the  ability  to  tolerate  latency, 

'This  justification  of  hybrid  architectures  based  on  latency  toleration  is  due  to  ideas  in  [lannucci  1988, 
Chapters  1  and  2]. 
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we  obey  “lannucci’s  Injiinction”  that  instructions  within  a  thread  may  not  have  unbounded 
latency.  Instructions  with  imboimded  latency  —  such  as  procedure  calls  and  global  memory 
accesses  —  cause  a  thread  to  suspend,  allowing  another  to  execute. 

My  work  iucludes  a  compiler  back-end  to  allow  dataflow  programs  to  run  on  the  J-Machine, 
a  general-purpose  massively-parallel  computer.  Although  closer  to  the  von  Neumann  model 
than  dataflow  architectures,  the  J-Machine  has  many  of  the  necessary  communication  and 
naming  primitives  needed  for  dataflow  computation.  I  built  my  back-end  on  top  of  the  Id 
compiler  developed  by  the  Computation  Structures  Group  at  the  MIT  Laboratory  for  Com¬ 
puter  Science  [Traub  1986a],  as  augmented  by  Robert  lannucci  to  produce  code  for  his  hybrid 
architecture  [lannucci  1988].  My  system  transforms  his  hybrid  code  to  run  on  the  J-Machine. 

1.1.1  Id 

Id  is  a  primarily  functioned  language  developed  in  the  Computation  Structures  Group  of  the 
MIT  Laboratory  for  Computer  Science  for  progreuruning  dataflow  and  other  parallel  comput¬ 
ers.  [Nikhil  1988]  is  a  reference  for  the  latest  version.  All  of  its  features  are  supported  by  my 
transformations,  except  for  algebraic  types,  as  they  postdate  lannucci’s  compiler  on  which 
mine  is  based.  A  quick  overview  of  pertinent  features  of  the  language  is  presented  here. 

Types 

The  only  primitive  types  in  Id  cire  boolezins,  chriracters,  numbers,  character  strings,  and 
symbok.*  AdditionaJly,  there  are  four  pre-defined  type  constructors  that  take  one  or  more 
types  and  create  new  types: 

•  array  types:  (ID^ray  t),  (2D_array  t),  ... 

•  list  types:  (list  t) 

•  tuple  types: 

•  function  types:  (fo  — ►  ti) 

*In  the  latest  version  of  Id,  booleans  are  not  primitive  but  are  defined  with  algebraic  types,  which  we  were 
unable  to  support,  as  described  above. 
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Id  is  strongly-typed  in  that  extensive  compile-time  and  rim-time  type-checking  is  per¬ 
formed,  but  users  rarely  explicitly  provide  type  information.  Additionally,  Id  allows  polymor¬ 
phism. 

Function  Application 

The  application  of  function  /  with  arguments  ui,  ...,an  is  written: 

/  fli— On 

Id  also  supports  currying:  If  fimction  /  “expects”  two  arguments,  /a,  instead  of  being  illegal 
as  in  most  languages,  returns  a  function  that  takes  one  argument.  For  example,  if  plus  is 
defined  as  a  function  that  takes  two  numbers  and  adds  them,  plus  3  returns  a  function  that 
teJces  one  number  as  an  argument  and  adds  3  to  it.  As  will  be  seen  later,  currying  causes 
additional  overhead  in  run-time  procedure  linkage. 

I-Structures 

One  major  argument  agaiinst  purely  functional  languages  is  their  suboptimaJ  efficiency  with 
arrays.  Specilicadly,  it  is  unnecessarily  wasteful  to  copy  an  entire  array  when  modifying  one 
element.  Filling  in  the  n  elements  of  a  previously-empty  airray  can  teike  0{n^)  time  and  space, 
as  the  entire  array  is  recopied  when  each  element  is  written.  This  problem  was  partially 
solved  with  I-struct\ires,  arrays  with  elements  that  can  only  be  written  to  once.  After  being 
written  to,  reads  take  place  as  expected;  subsequent  writes  are  a  nm-time  error.  Because  no 
copying  is  done,  filling  an  array  of  I-structures  takes  0(n)  time.  If  a  read  takes  place  before 
a  write,  the  read  is  silently  deferred  \mtil  the  data  is  avmlable.  This  process  is  illustrated  in 
Figure  1-1.  Out-of-botmd  accesses  to  I-struct\ires  cause  run-time  errors.  The  properties  of 
I-structures  guarantee  deterministic  behavior  in  legal  programs®.  While  keeping  Id  from  being 
purely  functional,  they  greatly  improve  its  efficiency  without  heirming  abstraction.  Tuples  and 
arrays,  described  above,  ^l^e  implemented  as  I-structures. 

In  addition  to  supporting  user  types,  I-structures  are  used  to  create  closures  for  currying 

*Here  and  elsewhere,  a  legal  program  is  one  in  which  no  compilc-time  or  run-time  errors  occur. 
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Figure  1-1:  A  FSM  Description  of  an  I-stnicture  Location.  Originadly,  an  I-structure  location 
is  empty.  Reads  are  silently  deferred  until  data  has  arrived.  Once  data  has  been  written, 
pending  and  subsequent  read  requests  can  be  fulfilled.  Writing  a  location  more  than  once  is 
a  mn-time  error. - 

procedure  calls.  Whenever  an  argument  is  applied  to  a  procedure,  a  check  is  made  whether 
the  argument  supplied  is  the  last  one.  If  so,  the  procedure  is  invoked;  otherwise,  the  argument 
is  added  to  the  I-structure  list  of  arguments  and  saved  into  a  closure. 

Blocks 

Blocks  in  Id  provide  a  mechanism  to  bind  names  to  values  within  the  block’s  body.  It  is 

analogous  to  Lisp’s  let  construct,  except  that,  as  in  all  Id  constructs,  the  textual  order  of  the 

statements  is  ignored.  A  block  to  compute  the  surface  area  of  a  cylinder,  given  its  radius  r 

and  height  h,  could  be  written: 

•C  face  =  Pi  *  r  *  r; 

body  =2*Pi*r*h 
in 

2  *  lace  +  body  } 

Note  that  it  is  not  always  possible  to  statically  determine  the  order  in  which  statements 
in  the  “declaration”  section  of  a  block  will  execute.  Consider  the  following  example  from 
[Traub  1989,  page  2]: 

•C  p  *  X  >  0; 

a  s  il  p  then  bb  else  3; 
b  s  if  p  then  4  else  aa; 
aa  s  a  5; 
bb  «  b  't'  6; 
c  »  a  +  b 
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in 

c}; 

If  X  >  0,  the  only  possible  order  of  evaluation  is:  p,  b,  bb,  a,  aa,  c.  If  x  <  0,  the 
expressions  must  be  evaluated  in  a  different  order:  p,  a,  aa,  b,  c.  This  provides  cin  example  of 
an  Id  fragment  in  which  the  order  of  execution  of  statements  cannot  be  determined  at  compile¬ 
time.  This  provides  a  theoretical  limit  on  compile-time  scheduling,  beyond  any  practical  limits 
based  on  insufficiently  sophisticated  compilers,  because  no  compile-time  sched\iling  exists. 

Loops 

The  format  of  a  loop  statement  is: 

{lor  X  <-  eindex  do 
<statement>  ; 

* 

<statefflent> 
finally  «} 

The  keyword  next  is  provided  to  refer  to  the  next  value  of  a  loop  iteration.  For  example,  a 
loop  to  add  the  first  n  integers  would  be  written: 

{  8U2B  s  0 

in 

•[  for  count  <-  1  to  n  do 
next  sum  =  sum  count 
finally  sum  }} 

The  semantics  of  Id  are  such  that  it  is  possible  for  multiple  iterations  of  a  loop  to  execute 
in  parallel.  laimucci’s  compiler  for  the  hybrid  architecture  has  loops  execute  in  “parallel” 
on  a  single  processor,  i.e.  statements  in  the  iteration  may  execute  before  statements  in 
the  iteration,  as  long  as  data  dependences  are  respected.  Inner  loops  are  put  in  separate 
codeblocks  and  can  be  spawned  to  separate  processors. 

M^lny  years  have  been  spent  developing  and  optimizing  an  Id  compiler  for  the  Tagged- 
Token  Dataflow  Architecture  [Traub  1986a],  a  paper  dataflow  architecture.  This  compiler  was 
the  base  of  lannucci’s  and  of  my  resezirch. 
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1.1.2  lannucci’s  Hybrid  Architecture 

Development  of  hybrid  architectures  is  an  active  area  of  research.  See  [Gaudiot  and  Bic  1989] 
for  a  summary  of  recent  research  in  the  area.  One  of  the  best  known  hybrid  architectures  is 
the  EM-4  being  developed  at  the  Electrotechnical  Laboratory  in  Japein  [Sakai  et  al  1989].  I 
chose  to  base  my  work  on  lannucci’s  system  because  of  the  ease  with  which  I  could  access  his 
compiler,  developed  at  MIT,  as  well  as  its  quality. 

lannucci’s  extensions  to  the  Id  compiler  make  use  of  information  av^lable  at  compile-time 
to  create  scheduling  quanta  (SQs),  sequences  of  code  within  which  the  order  is  specified  at 
compile-time.  Invocation  of  a  codeblock  or  procedure  takes  place  on  a  single  processor  and 
generally  consists  of  many  SQs.'*  When  a  procedure  is  invoked,  the  instructions  in  the  first  SQ 
are  executed  sequentially,  suspending  at  the  end  of  the  SQ  or  if  a  fault  occurs,  signifying  that 
needed  data  is  not  ready.  The  execution  of  other  SQs  results  from  explicit  forks.®  The  length 
of  sched^lling  qu2Lnta  is  limited  by  the  level  of  the  compiler’s  analysis  and  by  the  requirements 
of  Id.  Arguments,  local  variables,  and  all  but  the  most  ephemeral  of  temporaries  are  stored 
within  a  frame  allocated  when  the  codeblock  is  invoked.  My  implementation  for  the  J-Machine 
includes  aU  of  these  characteristics.  Further  details  about  larmucci’s  implementation  and 
architectiue  will  be  provided  as  needed  throughout  the  document.  Henceforth,  when  I  write 
“the  hybrid  architecture,”  I  mean  to  refer  to  lainnucci’s  architecture. 


1.1.3  The  J-Machine 

The  target  of  my  system  is  tho  J-Machine,  a  massively-parallel  MIMD  computer  based  on 
the  Message-Driven  Processor  (MDP).  Each  processor  has  260K  (4K  on  chip)  of  32-bit-word 
memory  augmented  with  4-bit  tags.  Tag  types  include  booleans,  integers,  symbols,  and  cfu- 
tures.  Cfutmes  generate  faults  on  most  operations.  The  MDPs  communicate  with  each  other 
through  a  low-latency  network  by  sending  messages.  When  a  message  arrives  at  a  processor. 


*To  be  exact,  it  is  not  always  true  that  a  procedure  invocation  executes  on  a  single  processor.  More 
precisely,  a  eodeblock  invocation  executes  on  a  single  processor.  A  procedure  is  usually  one  codeblock,  but 
there  are  exceptions.  When  interior  procedures  are  lambda-lifted  out  of  a  procedure  definition,  they  constitute 
separate  codeblocks,  as  do  inner  loops,  so  that  they  can  be  spawned  among  processors.  Occasionally  in  the 
document,  I  provide  simplified  explanations  whose  exact  details  are  fleshed  out  later. 

‘Throughout  this  document,  I  use  “fork”  to  mean  enabling  a  continuation  on  the  current  processor  and 
“spawn”  for  enabling  a  continuation  on  another  processor. 
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it  is  written  into  the  message  queue.  When  the  message  gets  to  the  head  of  the  queue,  its  first 
word  is  loaded  into  the  instruction  pointer,  and  a  pointer  to  the  base  of  the  message  is  loaded 
into  an  address  register  so  that  subsequent  words  may  be  accessed.  Execution  continues  se¬ 
quentially  until  an  explicit  suspend  instruction.  The  first  J-Machine  is  expected  to  be  built 
within  a  year  eind  will  have  thousiuids  of  processors.  For  my  research,  I  used  a  simulator  of 
a  32-node  J-Machine  [Horwat  and  Totty  1987].  See  [Dally  et  al  1988b]  for  a  more  complete 
description  of  the  Message- Driven  Processor. 

1.2  Overview 

In  Chapter  2, 1  provide  an  overview  of  how  the  code  is  executed  on  the  J-Machine,  describ¬ 
ing  the  nm-time  structures  and  control  structure  transformations.  Chapter  3  describes  my 
compiler  and  how  it  fits  on  top  of  the  Id-to-hybrid  compiler,  as  well  as  showing  the  code 
production  templates.  Chapter  4  provides  benchmarks,  including  an  extended  example  of 
the  tr<msformation  and  execution  of  a  simple  factorial  program.  Chapter  5  is  the  conclusion, 
presenting  my  retrospective  opinions  on  the  project  and  describing  ways  in  which  it  could  be 
improved.  The  appendices  include  program  examples  and  source  code. 
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Chapter  2 


Executing  Hybrid  Code  on  the 
J-Machine 


The  villainy  you  teach  me  I  will  execute, 
and  it  shall  go  hard, 
but  I  will  better  the  instruction. 

—  William  Shakespeare,  The  Merchant  of  Venice,  Act  HI,  scene  i,  line  76. 

Because  Id  is  designed  for  dataflow  processors  —  its  name  stands  for  Irvine  Dataflow  — 
its  run-time  demands  are  different  from  those  of  traditional  imperative  languages  designed 
for  von  Neumann  processors.  On  dataflow  architectures,  such  as  the  Tagged-Token  Dataflow 
Architecture  and  Monsoon,  instructions  are  scheduled  individually  as  soon  as  the  data  de¬ 
pendences  have  been  satisfied.  It  would  not  be  reasonable  to  attempt  to  imitate  this  on  a 
non-dataflow  architecture:  When  I  hand-compiled  Id  programs  onto  the  J-Machine  with  such 
a  strategy,  overhead  was  extremely  high.  For  a  typical  dataflow  instruction,  such  as  plus,  with 
two  sources  and  two  sinks,  20  MDP  instructions  were  executed  [Spertus  1989]. 

One  of  the  major  goals  of  compiling  any  language  is  to  do  as  much  work  as  possible  at 
compile-time,  leaving  a  minimum  of  work  for  run-time.  Thus  before  running  dataflow  code 
on  a  von  Neumann  processor,  the  compiler  should  sequentialize  sequences  of  instructions  as 
much  as  possible.  In  [Traub  1988],  a  method  of  sequentializing  regions  of  code  into  threads, 
or  scheduling  quanta  (SQs),  is  presented.  This  lessens  the  amount  of  r\m-tuae  overhead 
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considerably;  however,  it  does  not  reduce  it  to  zero.  Because  it  cannot  be  determined  statically 
what  order  the  SQs  must  nm  in  —  if  it  were  knovm,  the  SQs  would  already  have  been  combined 
—  some  run-time  scheduling  is  necessary.  Specifically,  SQs  are  explicitly  forked  as  soon  as 
the  necessary  data  might  be  present.  They  may  begin  executing  any  time  thereafter.  Within 
a  SQ,  checks  are  performed  to  see  if  necessary  data  is  present.  If  it  is  not,  the  SQ  suspends, 
to  try  again  once  the  data  is  received.  Rim-time  support  is  necessary  for  these  operations. 
In  this  chapter,  I  describe  the  run-time  behavior  of  the  programs  at  a  detailed  but  relatively 
high  level.  I  go  into  lower  level  detail  in  the  following  chapters. 

2.1  Overview 

Program  execution  on  the  J-Machine  is  based  on  the  same  ideas  as  on  the  hybrid  architectme: 
Instructions  are  grouped  into  scheduling  quanta  subject  to  the  following  constraints: 

1.  The  program  yields  the  same  results  as  pure  dataflow  computation. 

2.  No  deadlocks  are  introduced. 

3.  An  instruction  with  \mboimded  latency  must  not  be  within  a  SQ. 

Because  I  work  with  the  scheduling  qumta  produced  by  lannucci’s  compiler,  I  inherit  the 
asstirance  that  the  partitioning  yields  correct  and  terminating  results  [Icinnucci  1988,  Chapter 
4].^  As  lannucci  did,  I  divide  all  tmboimded- latency  tasks  into  multiple  phases  so  that  other 
tasks  can  execute  between  initiation  eind  fulfillment  of  a  request. 

When  a  codeblock  is  invoked,  a  contiguous  region  of  memory  called  a  frame  is  allocated 
for  its  arguments  and  scratch  variables.  The  frame  is  given  a  unique  global  name.  Because 
each  invocation  has  its  own  data  area,  the  same  procedure  can  execute  multiple  times  on  one 
processor,  with  execution  of  the  invocations  interleaved.  After  a  codeblock  starts  executing, 
it  win  probably  faidt  on  a  slot  in  its  frame  —  i.e.  it  will  look  for  a  value  in  a  specific  slot  of 
the  frame,  but  the  data  will  not  be  present.  In  this  case,  a  continuation  is  created  encoding 
the  code  address  and  is  stored  into  the  offending  slot.  When  the  data  arrives,  the  data  will  be 

‘It  is  not  entirely  true  that  I  use  the  SQ  divisions  unchanged.  As  will  be  discussed  in  the  next  chapter, 
there  are  a  few  cases  in  which  I  tweak  SQs. 
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Figure  2-1:  Run-Time  Data  Structures.  Slots  1  and  4  of  the  callee’s  frcune  are  empty,  signifying 
that  the  corresponding  data  values  have  not  arrived  yet  and  have  not  been  requested.  The 
data  for  slots  0,  2,  and  3  have  arrived.  Slot  0  points  to  the  caller’s  frame  so  that  the  return 
value  can  be  sent  there.  The  data  for  slot  5  has  not  arrived.  The  presence  of  a  continuation 
list  indicates  that  instructions  in  the  codeblock  have  tried  to  access  slot  5.  When  the  data 
arrives,  the  SQs  indicated  in  the  codeblock  will  be  restarted. 


written  into  the  frame  slot  and  the  continuation  wiU  be  re-enabled.  When  all  of  the  SQs  in  a 
codeblock  have  successfully  completed  and  any  return  values  have  been  sent  to  the  caller,  the 
frame  can  be  freed.  These  structures  are  shown  in  Figure  2-1.  The  following  sections  describe 
them  in  more  detail. 


2.2  Data  Structures 

2.2.1  Codeblocks 


A  codeblock  consists  of  one  or  more  scheduling  quanta  stored  contiguously  on  each  processor 
on  which  the  procedure  might  be  invoked.  Unlike  [Horwat  1989],  code  is  distributed  at  load¬ 
time.  The  format  of  a  pointer  to  a  codeblock  is  shown  in  Figure  2-2.  A  user-defined  tag  value, 
CB,  is  used  to  indicate  a  pointer  to  a  codeblock.*  The  low  sixteen  bits  of  the  descriptor  hold 


*In  this  context,  “user-defined”  means  defined  by  my  dataflow  system,  as  opposed  to  the  hardware-specified 
tag  types  on  the  MDP.  The  MDP  has  9  pre-defined  tag  types  and  4  user-defined  types. 
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Figure  2-2:  A  Pointer  to  a  Codeblock.  The  user-defined  tag  CB  denotes  a  pointer  to  a 
codeblock.  The  low  sixteen  bits  tell  how  large  a  frame  must  be  allocated  for  the  codeblock  to 
execute.  The  high  sixteen  bits  tell  where  the  codeblock  can  be  found. 

the  number  of  words  of  storage  required  for  each  invocation,  and  the  high  sixteen  bits  hold 
the  address  of  the  first  SQ  in  the  codeblock. 

2.2.2  The  Data  Stack 

Memory  is  allocated  from  a  stack,  initialized  to  nidi  cfutures.  A  cfuture  is  a  MDP  data  type 
on  which  most  instructions  faxdt.  Thus,  slots  are  pre-initialized  to  “empty”.  A  heap  would 
be  a  more  efficient  representation  because  memory  could  be  freed  and  reused,  but  not  enough 
time  was  available  to  implement  one.  The  three  run-time  data  structures  allocated  from  the 
stack  are  frames,  continuations,  {ind  I-struct\ires,  described  in  the  following  sections. 

2.2.3  Frames 

For  a  codeblock  to  execute,  it  needs  a  frame,  a  contiguous  block  of  storage  initialized  to 
null  cfutures  (i.e.  to  empty).  A  pointer  to  the  base  of  a  frame  is  called  a  frame  descriptor. 
Figure  2-3  shows  a  frame  descriptor  and  a  procedure  frame.  A  user-defined  tag  value,  FD, 
is  used  to  indicate  a  pointer  to  a  frame.  The  low  sixteen  bits  of  the  descriptor  hold  the 
node  number,  and  the  high  sixteen  bits  hold  the  local  address,  combining  to  provide  a  global 
address.  Storing  the  node  number  in  the  low  sixteen  bits  provides  an  efficiency  bonus  on  the 
J-Machine  as  first  described  in  [Horwat  1989,  page  68). 
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Figure  2-3:  A  Non-Loop  Procedure  Frame.  A  user-defined  tag,  FD,  denotes  a  freime  descriptor. 
It  encodes  the  unique  globed  address  of  a  frame.  The  first  slot  of  a  frame  holds  a  frame 
descriptor  indicating  where  to  send  return  values.  The  next  slot  holds  the  address  of  the  I- 
structure  chain  of  arguments.  In  some  cases,  the  arguments  can  be  passed  directly  in  argument 
slots.  The  remaining  slots  are  used  for  scratch  values  during  the  procedure’s  execution. 


Slot  0  of  the  frame  holds  a  frame  descriptor  telling  where  to  send  any  return  values. 
Some  subtleties  are  involved  in  whether  the  arguments  are  passed  in  argument  slots  or  as 
an  I-structure  chain.  I  retain  lannucci’s  conventions,  and  the  interested  reader  is  referred  to 
[lannucci  1988,  pages  111-113].  The  additional  slots  present  in  codeblocks  with  loops  will  be 
discussed  in  Section  2.3.3.  Except  for  how  I  handle  loops,  my  frames  are  identical  to  those 
used  by  lannucci.  The  base  of  the  frame  currently  executing  is  always  kept  in  MDP  address 
register  A2.  Taking  £dl  frame  accesses  relative  to  A2  allows  multiple  invocations  of  a  procedure 
to  run  on  the  same  processor. 


2.2.4  Continuations 

When  an  attempt  is  made  to  read  an  empty  frame  slot  (i.e.  a  efuture),  a  fault  occurs  whose 
handler  does  the  following: 

1.  Stores  a  request  to  restart  the  SQ  when  the  data  arrives. 

2.  Suspends,  in  order  to  let  another  SQ  execute. 
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In  producing  code,  I  ensure  that  at  the  time  of  a  cfuture  fault,  the  MDP  register  RO  holds  a 
message  indicating  where  execution  should  restart.  I  also  take  advantage  of  the  MDP’s  always 
storing  the  absolute  address  of  the  last  memory  access  in  the  MAR  register.  This  allows  the 
fault  handler  to  determine  which  piece  of  data  was  missing.  The  hcindler  ^dlocates  a  triple 
(i.e.  three  words)  from  the  stack  and  sets  them  to  the  foUowmg: 

1.  A  message  indicating  where  execution  should  restart  (taken  from  RO). 

2.  The  base  of  the  current  frame  (taken  from  A2). 

3.  A  pointer  to  the  next  continuation  (if  any)  waiting  on  the  faulted  location.  This  is  the 
old  value  of  the  slot. 

The  address  of  the  triple  is  tagged  as  a  cfuture  and  is  written  into  the  data  location  for  which 
the  fault  occurred.^  When  the  data  arrives,  the  slot  is  checked  just  before  the  data  is  written. 
For  every  continuation  present,  the  indicated  message  is  sent  and  the  continuation  freed.”* 
Because  codeblocks  execute  within  one  processor,  the  message  is  sent  from  the  processor  to 
itself.  J-Machine  routing  is  done  in  such  a  maimer  that  this  is  a  cheap  operation.  Allocating 
and  filling  a  continuation  after  a  fault  takes  18  cycles.  Writing  to  a  frame  slot  takes  7  cycles 
if  no  continuations  are  waiting  and  8  +  6  *  to,  if  tu  continuations  are  waiting. 

An  Alternate  Method  for  Continuations 

I  considered  an  alternate  method  of  keeping  track  of  suspended  continuations.  Instead  of 
storing  the  continuation  in  a  tuple  allocated  from  the  stack,  the  system  could  immediately 
send  the  message  indicating  where  execution  should  restart,  effectively  putting  it  at  the  end 
of  the  local  message  queue.  When  the  message  reaches  the  head  of  the  queue,  it  is  tried  again. 
If  the  data  has  arrived,  it  executes  successfully  (or  at  least  until  the  next  fault);  otherwise,  it 
will  throw  itself  on  the  queue  again. 

This  method  has  several  advantages: 

1.  It  seems  to  fit  more  elegaintly  on  top  of  the  J-Machine,  taking  advcintage  of  the  message 
queue  provided. 

*To  be  precise,  a  quadruple  is  sometimes  needed  instead  of  a  triple,  as  will  be  explained  in  Section  2.3.3. 
*Due  to  the  primitive  memory  management  of  my  system,  the  locations  are  freed  in  concept  only. 
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Figure  2-4:  An  I-Stnicture  Descriptor  and  Storage.  An  I-structure  descriptor  includes  its  type 
and  a  global  address  that  points  to  a  block  of  storage,  holding  the  bounds  and  the  data. 

2.  Message  suspension  executes  more  quickly. 

3.  There  is  no  need  to  check  a  frame  location  before  writing  a  value  to  it. 

The  disadvantages,  however,  are  major:  A  SQ  could  restart  and  fail  many  times,  using  an 
unbounded  number  of  machine  cycles.  Additionally,  the  MDP  message  queue  could  overflow. 
For  these  reasons,  I  decided  not  to  use  this  method. 

2.2.5  I-Structures 

I-structures  are  defined  in  Section  1.1.1.  To  review,  they  axe  array-like  data  structures  whose 
entries  can  be  written  once.  Reads  before  writes  are  silently  deferred.  (This  shows  one  of  the 
reasons  high  latency  toleration  is  necessary.)  I-structures  are  allocated  explicitly  by  the  user 
and  implicitly  for  argument  chains  for  procedure  calls.  Due  to  time  constraints,  I-structures 
Me  not  handled  by  my  compiler;  however,  I  did  develop  and  test  the  translation  methods  that 
would  be  used. 

Figure  2-4  shows  how  I-structure  descriptors  and  storage  are  implemented.  I-structure 
descriptors  are  built  analogously  to  frame  descriptors,  using  the  user-defined  tag  name  ITAG. 
The  low  and  high  boimds  of  the  I-structure  are  stored  at  the  base  of  the  region  of  storage, 
after  which  the  data  appear  sequentially. 

For  a  given  cell  of  I-structure  storage,  there  are  three  possible  states,  corresponding  to  the 
non-error  states  in  Figure  1-1.  The  possibilities,  and  how  they  are  indicated,  are: 

1.  Empty,  indicated  by  a  null  cfuture. 
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Figure  2-5:  An  I-Structure.  The  lower  and  upper  bounds  of  this  I-structure  are  5  and  8, 
respectively.  When  a  read  or  write  request  arrives,  a  run-time  error  occurs  if  the  passed-in 
offset  is  out  of  bounds.  If  not,  the  lower  bound  is  subtracted  from  the  p  =ssed-in  offset,  and 
the  corresponding  cell  is  examined.  In  this  example,  data  has  been  written  to  I[5]  and  I[7], 
there  have  been  no  attempts  to  read  or  write  I[6],  and  there  have  been  two  reads  to  I[8]  that 
will  be  satisfied  when  the  data  arrives.  Writing  to  a  slot  more  than  once  is  a  run-time  error. 

2.  Wadting  for  data,  indicated  by  a  cfuture  whose  vaJue  points  to  a  local  linked  list  of 
continuations  needing  the  data. 

3.  Full,  indicated  by  a  non-future  (i.e.  the  data  itself). 

The  continuations  are  of  the  same  form  as  desc’**’  '  in  Section  2.2.4.  An  exannple  of  an 
I-structure  is  shown  in  Figme  2-5. 

Writing  an  element  of  an  I-structure  takes  20  -f  6  *  r  instructions,  where  r  is  the  number 
of  pending  requests.  The  read  handler  take-?  i.,0  :.:structions  if  the  data  is  present  and  30 
if  it  is  not.  These  times  include  comparing  against  the  bounds,  subtracting  off  the  lower 
boimd,  ensuring  that  no  more  thfin  one  write  is  done,  and  allocating  any  memory  needed  for 
continuations. 


2.3  Control  Structure 

2.3.1  Execution  Within  a  Codeblock 

To  see  how  execution  proceeds  within  a  codeblock,  let  us  review  the  example  block  from 
Section  1.1.1.  It  is  reproduced  in  Figtire  2-6.  Consider  the  possible  orders  of  evaluation: 

•  If  X  >  0,  d  db  —*  a  —*■  aa  —*  c. 

•  If  2  <  0,  a  — ♦  aa  — ♦  6  — »  56  — »  c. 
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del  abc  z  = 

•C  p  =  X  >  0; 

a  s  il  p  then  bb  else  3; 
b  s  il  p  then  4  else  aa; 
aa  =  a  5; 
bb  *  b  +  6; 
c  =  a  +  b; 
in 

c}; 


Figure  2-6:  A  Statically  Unschedulable  Codeblock.  It  is  impossible  to  determine  the  order  in 
which  a,  b,  aa,  and  bb  must  be  computed  without  knowing  whether  z  >  0. 


Observe  that  in  both  cases,  b  precedes  bb,  a  precedes  aa,  p  is  the  first  calculation,  and  c  is  the 
last.  Using  these  static  dependences,  we  partition  the  code  into  three  scheduling  quanta,  as 
shown  in  Figure  2-7.® 

Let  us  consider  the  case  where  z  >  0.  P  is  the  first  SQ  to  execute.  As  shown  in  Figure  2-8, 
it  computes  p  then  forks  A,  B,  and  C,  in  that  order,  and  suspends.  A  begins,  then  suspends, 
because  bb  is  needed  but  not  available.  B,  next  in  the  queue,  begins  and  executes  to  completion. 
When  it  stores  bb,  it  sees  that  A  is  waiting  on  the  value  and  sends  a  message  to  restcirt  A.  C 
then  begins  executing  and  faults  on  a,  suspending.  The  second  attempt  to  execute  A  is  now 
at  the  head  of  the  message  queue  and  completes,  sending  a  request  to  restart  C.  C  executes, 
performing  the  addition  smd  whatever  else  follows  (such  as  returning  the  resulting  value). 
The  astute  reader  will  have  noticed  that  the  sample  procedure  could  be  reduced  to 

del  abc  x 

il  X  >  0  then 
14 

else 

11; 

Despite  this  possible  compile-time  reduction,  the  example  is  still  relevant  for  two  reasons: 
First,  the  early  stages  of  the  compiler  are  not  sophisticated  enough  to  perform  the  reduction; 
second,  examples  exist  for  which  no  such  reduction  is  possible.  For  example,  if  in  the  original 


'Throughout  the  text,  partitions  are  simplified  to  provide  a  more  intuitive  understanding  than  would  be 
gained  by  going  into  the  exact  details  on  how  a  SQ  is  produced. 
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Figure  2-7:  Scheduling  Quanta  for  Unschedulable  Exftmple:  The  code  in  Figure  2-6  is  divided 
into  four  scheduling  quanta.  The  calculations  for  b  and  bb  appear  in  the  same  quantum  because 
bb  depends  only  on  b.  It  is  impossible  to  determine  statically  whether  SQ  A  or  B  executes 
first.  Arrows  indicate  that  one  SQ  forks  another. 
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Figme  2-8:  Snapshots  for  Codeblock  Example.  This  shows  snapshots  of  the  message  queue 
and  frame  before  each  SQ  for  the  program  in  Figure  2-6. 
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Processor  A 


Network 


Processor  B 


time 


Initiate  get-context  request 

context  request 

context  value 

argument  value 

Send  argument 

^  return  vnliie  —  — 

Use  returned  value 

Allocate  a  local  context 


Start  executing  procedure 


Resume  procedure  execution 


Locally  free  the  context 


Figtire  2-9:  Procedure  Linkage  Example.  Processor  A  requests  a  context  on  processor  B.  As 
soon  as  the  firame  is  allocated,  execution  of  the  procedure  call  begins  on  B.  When  A  receives 
the  context  value,  it  can  send  the  argument(s),  after  which  B  can  complete.  Shaded  rectangles 
indicate  time  that  could  be  spent  on  other  tasks.  Note  that  those  tasks  are  not  interrupted 
when  data  arrives. 


program  (Figure  2-6),  the  bindings  for  a  and  b  were  changed  to  a  *  f  x  bb  and  b  *  g  x 
aa,  where  /  and  g  are  passed  in  as  parameters,  no  compile-time  reductions  would  be  possible 
[Traub  1989,  page  2]. 

2.3.2  Procedure  Calls 

Figure  2-9  shows  how  procedure  linkage  is  done  without  tying  up  either  processor.  When 
processor  A  wants  to  call  a  procedure  on  processor  B,  A  must  allocate  a  context  (frame)  on 
B  for  the  codeblock’s  arguments  and  scratch  area.  Allocating  a  context  has  the  side  effect  of 
starting  execution  of  the  first  SQ  in  the  procedure.  After  the  address  of  the  frame  is  returned 
to  A,  it  sends  the  arguments  to  B,  which  will  have  faulted  if  the  data  was  already  needed. 
When  the  data  arrives  on  B,  suspended  SQs  are  restarted.  After  B  completes,  it  sends  the 
rf^tum  value  (if  any)  and  a  signal  to  A,  and  it  frees  its  frame.  Note  that  other  processes  can 
execute  while  A  and  B  are  waiting  for  data. 

While  it  would  be  more  efficient  in  most  cases  for  a  caller  to  be  able  to  send  arguments  at 
the  same  time  as  requesting  the  context,  there  was  no  clean  way  to  do  this.  An  interesting 
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effect  of  this  policy  is  that  (as  in  other  Id  implementations)  a  procedure  can  conceivably 
do  substantial  calculation  or  even  return  a  value  before  receiving  any  arguments!  This  is 
necessary  because  procedure  calls  in  Id  are  non-strict. 

Currently,  the  system  does  not  do  any  load-balancing,  and  it  always  spawns  procedures 
to  the  same  processor.  The  user  must  adjust  the  compiled  code  to  provide  a  distribution 
appropriate  to  the  problem. 

2.3.3  Loops 

As  in  all  other  implementations  of  Id,  I  provide  a  way  for  different  iterations  of  a  given  loop 
to  execute  at  a  time.  Because  iterations  of  an  outer  loop  execute  on  the  same  processor,  they 
do  not  execute  concurrently;  instead,  the  SQs  of  up  to  K  iterations  of  a  loop  are  enabled  at 
a  time,  where  K  is  the  loop- unfolding  constant.  When  a  calculation  within  one  iteration  is 
waiting  for  something,  such  as  the  result  of  a  procedure  call  to  another  processor,  instructions 
from  other  iterations  may  execute,  subject  to  data  dependences.  Because  up  to  K  iterations 
may  execute  at  once,  there  must  be  K  places  to  store  each  intermediate  value,  so  this  method 
requires  allocating  K  iteration  areas.  In  [laimucci  1988,  Section  4.3.5],  laimucci  presents  and 
proves  the  correctness  of  a  method  for  dynamically  unfolding  loops  which  guarantees  the  same 
results  as  sequential  execution.  I  use  his  method,  although  I  implement  it  differently. 

Concepts 

In  lannucci’s  method,  an  iteration  includes  the  evaluation  of  the  predicate  and  subsequent 
execution  of  either  the  loop  body  or  the  loop  termination  code.  He  observes  that  for  iteration 
t  to  begin,  three  conditions  must  hold: 

1.  The  predicate  for  iteration  (i  —  1)  has  been  evaluated  to  “true”. 

2.  The  (t  —  Ky^  iteration  has  terminated,  allowing  us  to  reuse  its  iteration  area. 

3.  The  (i  -f  1  -  JiC)‘^  iteration  must  have  already  consumed  its  loop  variables. 

The  third  condition  is  the  most  subtle.  It  exists  because  iteration  i  will  write  the  values  of 
loop  variables  into  the  slots  of  iteration  t  -bl.  Hence,  iteration  i  cannot  execute  xmtil  iteration 
1  +  1  -  A”  is  done  with  the  v£ilues  currently  stored  in  these  slots. 
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Figure  2-10:  Possible  Implementation  of  an  Iteration  Descriptor.  The  iteration  fields  hold  the 
offsets  from  the  frame  base  of  the  next,  current,  and  previous  iteration  areeis.  The  Import  and 
PC  flags  teU  whether  this  iteration  may  begin.  Bits  26  through  31  are  \mused.  This  format 
was  not  used. 


These  rules  are  enforced  with  two  flags,  PC  and  import  Iteration  t’s  PC  flag  is  set  when 
the  first  condition,  that  the  predicate  for  iteration  t  —  1  is  true,  has  been  established.  The 
import  flag  is  based  on  condition  three;  it  is  set  when  the  next  iteration  area  is  ready  to 
import  new  loop  variables.  In  [larmucci  1988,  pages  129-131],  lannucci  proves  that  the  rules 
for  the  two  flags  cover  all  three  conditions.  When  both  of  an  iteration’s  flags  are  true,  its  first 
SQ  (presumably  to  compute  the  predicate)  may  be  enabled. 

Implementation 

lannucci’s  hybrid  architecture  supports  loops  with  several  special-purpose  instructions  and 
hardware  support.  Specifically,  iteration  descriptors,  containing  the  two  flags  and  pointers 
to  the  previous,  current,  and  next  iteration  areas,  can  be  stored  in  one  machine  word.  As 
Figure  2-10  shows,  it  was  possible  to  store  all  these  quantities  into  the  MDP’s  shorter  (32-bit) 
words,  but,  lacking  hardware  support  for  accessing  these  fields,  shifting  and  masking  were  too 
slow.  Additionally,  in  the  small  amotmt  of  space  available  for  each  iteration  pointer,  it  was 
only  possible  to  store  offsets  relative  to  the  current  freime,  not  absolute  addresses,  which  would 
be  more  convenient.  Hence,  I  decided  not  to  mimic  the  hybrid  architecture’s  implementation, 
and  I  developed  my  own  data  structures. 

Figure  2-11  shows  a  frame  for  a  procedure  with  a  loop.  In  addition  to  the  slots  found  in 
non-loop  frames  (see  Figure  2-3),  it  has  slots  for  the  loop-unfolding  constant,  loop  constants, 
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FD  of  caller _ 

ISP  of  Afguirwnt  Chain 
K  (loop  unfolding  constant) 
Last  Argument 

First  Argument _ 

First  Loop  Constant 

Last  Loop  Constant _ 

Fust  Scratch  Stot 

Last  Scratch  Stot _ 

Pointer  to  iter  Area  -1 

Pointer  to  Iter  Area  K 
Area  for  Iters  0  mod  K 

Area  for  Iters  K-1  mod  K 


Figure  2-11:  A  Loop  Procedure  Frame.  Loop  procedure  frames  have  several  sets  of  slots  in 
addition  to  those  present  in  non-loop  frames.  Slot  2  holds  A",  the  loop- unfolding  constant.  A" 
specifies  how  many  iterations  may  be  tinrolled.  There  is  space  for  loop  constants,  values  that 
could  be  hoisted  out  of  the  procedure’s  loop.  Iteration  areas  are  used  for  circulating  variables 
and  each  iteration’s  temporaries.  The  pointers  allow  qmck  access  to  each  iteration  area. 


iterations  areas,  and  pointers  to  the  iteration  areas.  Each  iteration  area’s  flags  are  stored 
within  its  pointer.  The  pointers  to  iteration  areas  can  be  viewed  in  a  more  conceptual  way  in 
Figure  2-12.  In  order  to  support  iterations,  an  additional  piece  of  data,  an  iteration  number 
between  0  and  K  —  1  must  be  included  in  every  continuation.  When  a  loop  SQ  begins,  the 
iteration  number  is  used  to  find  the  pointer  to  the  current  iteration  area.  This  pointer  is 
stored  in  MDP  address  register  Al.  Slots  relative  to  the  current  iteration  area  can  then  be 
indexed  off  Al.  If  it  is  necessary  to  access  a  slot  in  the  previous  or  next  iteration’s  area,  the 
iteration  number  is  decremented  or  incremented  to  find  the  appropriate  pointer  from  the  table 
of  pointers  within  the  frame.  This  is  why  there  are  K  2  pointers  to  the  K  areas;  i.e.,  if 
iteration  0  is  active  and  wants  to  set  the  previous  iteration’s  import  flag,  the  pointer  can  be 
retrieved  without  providing  a  special  check  for  the  boundary  condition.  The  import  and  PC 
flags  are  stored  within  the  pointers. 

As  an  example,  consider  the  procedure  in  Figure  2-13  to  sum  the  results  of  a  function 
evaluated  on  the  first  n  positive  integers.  The  circulating  loop  variables  are  count  and  total. 
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Figure  2-12:  Iteration  Areas  and  Pointers.  Pointers  to  the  iteration  areas  are  stored  con¬ 
tiguously  from  a  known  offset  within  the  frame.  Having  K  +  2  pointers  to  the  K  iteration 
areas  is  an  optimization:  If  the  current  iteration  number  is  0  and  the  need  arises  to  access  the 
previous  iteration  area,  the  pointer  can  be  foimd  in  a  straightforward  manner,  i.e.  by  looking 
one  slot  earlier  than  the  pointer  to  the  current  iteration  area.  This  eliminates  costly  botmdairy 
condition  checks.  The  PC  and  import  flags,  not  shown,  are  packed  into  the  high  bits 


del  combine  n  1  = 

•C  total  =  0 
in 

{  lor  count  <-  1  to  n  do 

next  total  s  (f  count)  total 
finally  total  }} 

Figure  2-13:  Loop  Program  Example.  Procedure  combine  applies  function  /  to  the  first  n 
positive  integers,  summing  the  results.  For  example,  (combine  10  square)  would  return  the 
sum  of  the  squares  of  the  numbers  from  1  to  10. 
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1.  Initialize  the  K  iteration  pointers. 

2.  Set  the  import  flag  of  each  iteration  area. 

3.  Set  count  to  1  and  total  to  0  in  iteration  area  zero  and  reset  area  A"  —  I’s  import  flag  to 
ensure  that  area  zero  gets  to  read  count  and  total  before  they  are  written  over. 

4.  Set  area  zero’s  PC  flag,  which  will  enable  it,  as  the  import  flag  is  already  set. 

5.  For  each  enabled  iteration, 

(a)  Compare  count  to  n. 

(b)  If  count  <  n  then 

i.  Write  count  +1  into  the  first  slot  of  the  next  iteration  area  and  set  its  PC  flag. 

ii.  Spawn  (f  count). 

iii.  Add  the  result  of  the  previous  step  to  total,  writing  the  result  to  the  total  slot 
in  the  next  iteration  area. 

iv.  Now  done  with  all  incoming  circulating  variables,  set  the  previous  iteration 
area’s  import  flag. 

(c)  If  count  >  n  then  write  the  current  value  of  total  to  a  frame  slot  outside  the  iteration 
areas. 

6.  Once  the  final  result  has  been  written  to  the  outside  frame  slot  designated  for  the  finally 
value,  pass  it  up  to  the  caller. 


Figure  2-14:  Pseudo-Code  Produced  for  Loop  Example 


Pseudo-code  corresponding  to  the  code  that  would  be  produced  is  shown  in  Figure  2-14. 
Figure  2-15  illustrates  how  this  scheme  revecils  possible  parallelism.  Up  to  K  invocations  of 
/  will  execute  at  once.  If  /  is  slow,  this  is  a  big  win. 

The  reader  will  observe  that  this  scheme  does  not  address  nested  loops.  Those  are  pulled 
out  of  procedures  at  compile-time  and  form  new  codeblocks  that  will  be  cedled  by  the  original 
procedure.  Thus  inner  loops  can  execute  in  parallel  on  separate  processors. 

Because  of  a  bug  in  the  Id  compiler’s  interaction  with  lannucci’s  code,  I  was  unable  to 
have  my  compiler  support  loops.  (The  version  of  the  Id  compiler  currently  used  is  different 
from  the  one  lannucci  wrote  his  system  to  interface  with.)  For  my  research,  I  hand-compiled 
loop  procedures  to  explore  the  different  methods  of  implementation. 
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Figure  2-15:  Snapshots  for  Loop  Example.  The  snapshots  show  how  the  contents  of  the  first 
three  iteration  aireas  for  the  program  in  Figure  2-13  change  over  time.  The  first  snapshot  shows 
the  values  in  the  iteration  are2is  after  they  are  initialized.  The  only  non-empty  locations  are 
the  initial  values  for  count  imd  total  in  iteration  area  0,  which  has  been  enabled,  as  indicated 
by  the  darkened  border.  In  the  second  snapshot,  the  first  iteration  has  tested  the  predicate, 
vrritten  an  incremented  count  into  the  next  iteration  area,  and  has  made  the  ftmction  call.  In 
the  third  snapshot,  the  second  iteration  does  the  same.  Note  that  the  function  calls  execute 
in  parallel. 
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2.4  Conclusion 


Conventions  were  found  to  allow  Id  code  to  run  on  the  J-Machine  in  the  same  style  used  by 
lannucci  on  the  hybrid  architecture.  The  benefits  of  this  strategy  are: 

1.  Frames  allow  dynamic  dataflow,  i.e.  every  invocation  has  its  own  data  cirea. 

2.  SQs  reduce  the  amount  of  necessary  nm-time  scheduling. 

3.  Using  midtiple  phases  for  instructions  with  unboimded  latency  frees  the  processor  for 
useful  work. 

4.  Loop  unrolling  exposes  and  exploits  parallelism. 

These  powerful  techniques  are  supported  at  run-time  by  specied  data  structures,  fault  handlers, 
and  library  routines.  The  next  chapter  describes  the  compile-time  work  necessary  to  convert 
from  hybrid  format  to  MDP  format. 
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Chapter  3 


Compilation 


I  have  heard  of  your  paintings  too,  well  enough; 

God  has  given  you  one  face, 
and  you  make  yourselves  another. 
You  jig,  you  amble,  you  lisp... 
—  William  Shakespeare,  Hamlet,  Act  m,  Scene  i,  line  150. 


Because  the  MDP  architecture  is  so  different  from  the  hybrid  architecture,  substantial 
work  must  be  done  to  create  MDP  code  from  hybrid  code.  Keeping  with  the  philosophy  of 
the  originsd  ID  compiler,  described  below,  I  perform  my  transformations  in  several  stages. 
The  intermediate  forms  my  compiler  recognizes  or  produces  are: 

•  Hybrid  code. 

•  Complex  MDP  code,  machine  instructions  whose  opcodes  are  the  saime  as  those  on  the 
MDP  (with  a  few  extensions)  but  whose  addressing  modes,  etc.,  are  not  legal. 

•  Simple  MDP  code,  s-expressions  of  legal  MDP  instructions. 

•  MDP  assembly  code. 

My  back-end  converts  from  the  first  form  to  the  last.  The  rest  of  the  chapter  describes  this 
process. 
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Figure  3-1:  Structure  of  the  Id-to-MDP  Compiler:  Plain  roman  text  indicates  modtales  of  the 
original  Id-to-hybrid  compiler,  italics  indicate  modules  I  changed,  and  bold  indicates  modules 
I  added.  Program  graphs  2ue  a  form  of  dataflow  graph.  This  picture  is  modeled  after  one  in 
[lannucci  1988,  page  97]. 


The  original  Id  compiler  is  written  in  Common  Lisp  and  is  based  on  the  Dataflow  Compiler 
Substrate  [Traub  1986b],  a  set  of  abstractions  for  building  modular  compilers.  Each  module 
inputs  and  outputs  a  stream  of  Lisp  objects  (except  for  the  first  and  last  modules  which  only 
emit  or  collect,  respectively).  Figure  3-1  shows  how  my  modules  fit  on  top  of  the  Id  compiler. 
Figure  3-2  shows  the  formats  of  instructions  flowing  through  all  of  the  new  or  changed  stages. 
They  will  be  explained  in  more  detail  below.  The  appendices  contmn  complete  listings  of  the 
files  1  created. 
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Type  of  Stream 


Description 


Dataflow  Graph  Nodes 


lanr)ucci's  internal  hybrid  format 


A  stream  of  hybrid  instructions 


Instructions  with  MDP  operators  (or  one  of  a 
few  pseudo-ops)  but  illegal  operands 


Legal  MDP  instructions  in  s-expression  form 


Legal  MDP  assembly  code 


Figure  3-2:  New  and  Modified  Compiler  Stages:  Dataflow  code  flows  through  severed  stages 
in  order  to  become  MDP  assembly  code.  The  term  “VND”  is  used  to  distinguish  lannucci’s 
internal  representation  of  code  from  my  “hybrid”  format.  The  ellipses  between  the  first  two 
stages  indicate  that  other  stages  go  between  them. 
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3.1  Changes  to  Machine  Code  Generation 


The  machine  code  generation  module,  called  generaie-vnd-instructions  and  written  by  lan- 
nucci,  takes  progreim  graph  instructions  and  converts  them  to  hybrid  instructions.  In  some 
cases,  such  as  for  arithmetic  instructions,  the  transformation  is  trivial.  For  conditionals,  loops, 
and  procedure  calls,  however,  a  single  program  graph  instruction  expands  into  meiny  hybrid 
instructions.  Because  my  control  structure  transformations  for  loops  and  procedure  linkage 
differ  from  lannucci’s,  I  wrote  a  file  ckanges.lisp  that  replaced  his  templates  for  loops  and 
procedure  calls  with  my  own. 

3.1.1  Loops 

Originally,  for  the  loop  program  graph  instruction,  instructions  were  generated  to  support  the 
hybrid  architecture’s  implementation  of  loops.  Section  2.3.3  describes  how  my  implementation 
differs.  I  emit  different  hybrid  instructions  for  the  loop  set-up  instruction  to  initialize  the 
iteration  area  pointers.  Code  within  loop  SQs  is  passed  through  imchanged,  to  be  converted 
in  later  stages  of  the  compiler,  as  only  structural  changes  are  made  in  this  module. 

3.1.2  Procedure  Calls 

Section  2.3.2  described  my  mxilti-phase  convention  for  procedure  linkage,  but  it  glossed  over 
a  few  details.  Specifically,  my  implementation  differs  from  the  hybrid  one  in  an  importaint 
way:  On  the  hybrid  architecture,  the  get-context  instruction  C2ills  a  local  manager  that  selects 
a  frame  on  another  processor  where  the  procedure  can  be  spawned  [lannucci  1988,  page  174]. 
This  requires  a  processor  to  know  memory  usage  on  other  processors.  When  designing  the 
system  for  the  J-Machine,  I  decided  each  processor  should  know  as  little  as  possible  about  the 
other  processors,  particularly  because  the  J-Machine  is  massively  paredlel.  One  consequence 
was  that  I  rejected  this  scheme.  Instead,  I  changed  the  protocol  so  that  get- context  is  a  two- 
phase  instruction,  where  the  calling  node.  A,  asks  the  called  node,  B,  for  a  frame  address. 
The  complete  calling  protocol  is: 

1.  Execute  a  get-context  instruction  on  A.  Tliis  sends  a  request  to  processor  B  to  allocate  a 
frame  and  start  execution  of  the  appropriate  procedure,  and  to  send  the  frame  descriptor 
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F  back  to  processor  A. 

2.  Compute  the  return  location  for  the  procedure  call  (an  offset  into  the  ciirrent  frame) 
and  send  it  to  B,  attached  to  F.  Because  F  is  the  frame  descriptor,  B  wiU  know  where 
to  put  the  return  location. 

3.  Send  each  of  the  arguments  to  B,  attached  to  F. 

If  geUcontexi  were  merely  local,  no  data  faults  would  occrir  during  the  first  three  steps;  hence, 
the  value  for  the  return  location  cotild  be  written  into  a  register  instead  of  a  more  permanent 
place  like  a  frame  slot.  In  my  strategy,  a  fault  will  occur  during  step  2  because  F  is  not  locally 
available  yet.  Hence,  I  must  insert  a  suspensive  check  for  F  before  the  second  step.  This  way, 
it  will  be  safe  to  store  the  return  location  into  a  register.  There  will  be  no  danger  that  a  fault 
will  occtir  on  F  between  the  time  the  register  is  written  and  when  the  register  is  accessed  to 
send  its  value  to  B.  (The  values  in  registers  are  not  guaranteed  between  suspensions,  and  it 
would  have  been  too  difficult  for  me  to  change  the  hybrid  compiler’s  frame  allocation.) 

Even  this  is  not  the  whole  story.  Consider  a  doubly-recursive  procedure  like  a  naive 
implementation  of  Fibonacci.  Figure  3-3  shows  the  code  that  would  be  produced  by  the  J- 
Machine  strategy  just  described.  The  problem  with  this  code  is  that  the  second  get-context 
request  would  not  be  made  until  after  the  first  one  returns.  This  introduces  unnecesseiry 
dependences,  as  it  implies  that  steps  5-8  in  the  figure  cannot  occur  \mtil  steps  1-4  are  finished. 
This  was  not  a  problem  on  the  hybrid  architecture,  where  it  was  known  that  steps  1-4  would 
not  suspend.  Because  step  2  will  suspend,  steps  5-8  will  be  delayed  unnecessarily.  This  is 
illustrated  in  Figme  3-4.  The  arrow  indicates  the  short-cut  that  exists:  The  second  request 
can  be  started  immediately  zdter  the  first.  Hence,  before  the  suspensive  check,  we  add  ein 
instruction  to  fork  a  continuation  corresponding  to  whatever  follows  the  procedme  call  — 
essentially  splitting  the  SQ. 


3.2  Assembling  Hybrid  Code 

The  last  stage  of  lannucci’s  compiler  is  an  assembler  that  converts  his  internal  representation 
of  hybrid  code  into  one  suitable  for  his  interpreter.  I  modified  this  stage  to  produce  a  stream 
of  hybrid  instructions  suitable  for  my  stages. 
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1.  Execute  get-context  for  the  first  recursive  call.  The  value  for  the  frame  FI  will  be 
returned  at  some  unknown  time. 

2.  Make  a  suspensive  reference  to  El,  so  that  we  can’t  get  to  the  next  step  unless  it  has 
arrived. 

3.  Compute  the  return  location  for  the  first  procedure  call  and  send  it  to  B1  attached  to 
FI. 

4.  Send  the  arguments  to  B1  attached  to  Fl. 

5.  Execute  get-context  for  the  second  recursive  call.  The  v'alue  for  the  frame  F2  will  be 
returned  at  some  unknown  time. 

6.  Make  a  suspensive  reference  to  F2,  so  that  we  can’t  get  to  the  next  step  unless  it  has 
arrived. 

7.  Compute  the  return  location  for  the  second  procedure  call  and  send  it  to  B2  attached 
to  F2. 

8.  Send  the  arguments  to  B2  attached  to  F2. 


Figure  3-3:  A  Non-Optimal  J-Machine  Calling  Convention.  B1  and  B2  represent  the  two 
processors  on  which  the  subprocedures  are  spawned.  The  code  is  non-optimal,  because  F2 
would  not  be  requested  \mtil  after  Fl  had  been  received. 
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Figure  3-4:  The  Ordering  Specified  by  Successive  Function  Calls.  Unless  the  first  instruction 
explicitly  forks  the  second  request,  as  show  hy  the  arrow,  code  will  execute  sequentially  as 
indicated  by  the  plain  lines.  This  unnecessarily  lessens  the  amount  of  exploited  parallelism. 

3.3  Convert  Hybrid  to  Complex  J 

“Complex  J”  code  is  an  intermediate  format  that  is  relatively  easy  to  produce  from  hybrid 
code.  The  steps  for  converting  an  instruction  are: 

1.  If  any  operand  is  suspensive, 

(a)  Emit:  (suspensive-instruction) 

(b)  For  every  possibly-suspensive  operand  s,  emit:  (suspensive-operand  s) 

(c)  Emit:  (suspensive-check-done) 

2.  Convert  aU  references  to  hybrid  general-purpose  registers  to  references  to  temporary 
storage  on  the  MDP. 

3.  Emit  code  specified  by  the  template  corresponding  to  the  hybrid  instruction. 

Below,  I  describe  the  different  templates  for  classes  of  hybrid  instructions,  in  order  to  provide 
a  deeper  imderstanding  of  the  hybrid  instruction  set  as  well  as  of  the  transformation  process. 
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In  this  section,  I  go  into  considerable  detail.  Readers  are  prewarned,  lest  they  fall  off  the 
bottom  of  this  depth-iirst  search.  Casual  readers  may  wish  to  read  the  first  few  templates 
and  then  skip  to  the  conclusion  of  this  section  on  page  39. 

3.3.1  Label  Instruction 

The  template  for  converting  a  label  instruction  is: 

(def conversion  label  : label  (label-name) 

'((label  (label-name) 

(move  (:message  (:base  1))  (:j-register  12)))) 

The  first  line  generates  a  MDP  label  with  the  same  name  as  the  hybrid  label.  The  second  line 
says  to  move  the  value  at  offset  one  from  the  current  message,  i.e.  the  frame  address,  into 
MDP  address  register  A2.^  That  line  is  there  because  execution  can  begin  at  any  label,  and 
A2  is  always  asstuned  to  hold  the  base  of  frame  pointer. 

This  example  illustrates  one  of  the  differences  between  complex  cind  simple  MDP  code: 
On  the  J-Machine,  one  of  the  operands  of  a  move  must  be  a  general-purpose  register.  The 
above  move  will  be  broken  into  two  moves  in  the  next  stage,  converUcj-io-sj.  At  this  stage, 
we  do  not  have  to  concern  ourselves  with  such  details. 

3.3.2  Simple  Arithmetic  Instructions 

The  template  for  converting  an  arithmetic  instruction  such  as  add  is: 

(def conversion  j-add  :■*■  (si  s2  d) 

(append  (lookup-into  d) 

‘((add  (Sl  ,s2  ,d)))) 

The  lookup-into  routine  generates  code  to  restart  any  continuations  waiting  for  a  value  to 
be  written  to  location  d,  as  described  in  Section  2.2.4.  First,  the  slot  number  is  copied  into 
Rl,  then  the  library  routine  lookup-vector  is  called.*  Figme  3-5  shows  the  conversion  of  an 
addition  instruction. 

*In  the  hybrid  and  MDP  assembly  formats,  (moT*  i  B)  moves  the  contents  of  i  into  B,  not  vice  versa. 

*In  retrospect,  explicitly  mentioning  the  register  to  pass  the  argument  in  at  this  stage  is  an  unnecessary 
violation  of  abstraction. 
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(:adcl  (:frame  (:base  6}  :suspensive) 
(iliteral  (:integer  1)) 

(:frame  (:base  7))) 

O 

(suspensive-instruction) 
(suspensive-operand  (:frame  (:base  6))) 
(suspensive-check-done) 

(move  7  (:j-register  R1)) 

(call  lookup-vector) 

(add  (:frame  (:base  6)) 

(;literal  (:integer  1)) 

(:frame  (:base  7))) 


Figure  3-5:  The  Hybrid-to- Complex- J  Conversion  of  an  Addition.  Execution  will  only  get 
past  the  suspensive-operand  virtual  instruction  if  slot  6  of  the  current  frame  is  present. 


3.3.3  Complicated  Arithmetic  Instructions 

Some  arithmetic  instructions  are  more  complicated,  such  as  abs,  min,  and  max,  because  they 
are  machine  instructions  on  the  hybrid  architecture  but  not  on  the  J-Machine.  Thus  they  have 
larger  templates  that  use  temporary  registers.  Figure  3-6  shows  the  template  for  abs.  The 
reserve  and  free  pseudo-ops  tell  the  next  stage  of  the  compiler  where  MDP  registers  should 
be  allocated.  Without  this  facility,  the  conversion  of  templates  requiring  temporary  storage 
would  be  much  less  efficient.  They  will  be  discussed  in  more  detail  in  the  section  on  the  next 
stage  of  the  compiler. 

3.3.4  Move  Instructions 

The  template  for  converting  a  move  instruction  is: 

(def conversion  move  :move  (source  dest) 

(append  (lookup-into  dest) 

'((move  , source  ,dest)))) 

If  the  destination  is  a  frame  slot,  this  generates  code  to  restart  any  continuations  waiting  on 
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(d«l conversion  j-abs  :abs  (s  d) 

(append  (lookup-into  d) 

'((reserve  (: register  scratchl)) 

(reserve  (:register  scratcb2)) 

(ash  ,8  -31  (.’register  scratchl}} 

(xor  ,s  (:register  scratchl}  (:register  scratch2}} 
(sub  (:register  scratch2}  (:register  scratchl}  ,d} 
(free  (: register  scratchl}} 

(free  (: register  scratch2}}}}} 


Figure  3-6;  The  Template  for  Converting  Absolute  Value.  Two  scratch  registers  must  be 
reserved  for  the  optimal  absolute  value  strategy.  They  are  used  for  temporary  values  and  £ire 
freed  at  the  end  of  the  template.  The  reserve  and  free  are  instructions  to  later  stages  of  the 
compiler  and  do  not  directly  produce  einy  code. 


the  value  and  then  performs  the  move. 

The  move-remote  instruction  moves  a  value  into  a  slot  of  another  frame.  Its  template  is: 

(def conversion  novr  ;oove-remote  (frame-ptr  offset  value} 

'((sendO  , frame-ptr}  ;  lode  number 

(sendO  (:ref  local.movr}}  ;  HSG  vord 

(sendO  , frame-ptr}  ;  First  argtment:  frame  descriptor 

(sendO  , offset}  ;  Second  argument:  offset  within  frame 

(sendeO  .value}}}  ;  Third  argument:  value  to  write 


On  the  J-Machine,  the  first  word  of  a  send  sequence  is  a  niimber  specifying  the  destination 
node.  The  second  word,  the  message  header,  specifies  both  how  long  the  message  is  and  the 
address  of  the  handler  to  receive  it.  The  meaning  of  subsequent  words  is  determined  by  the 
handler. 

To  imderstand  the  above  template,  recall  from  Section  2.2.3  that  the  node  number  is 
stored  in  the  low  sixteen  bits  of  the  frame  descriptor.  Because  the  router  only  looks  at  the 
low  sixteen  bits,  sending  the  frame  descriptor  specifies  the  correct  destination  node.  When 
the  message  reaches  that  node,  execution  will  begin  at  the  locaLmovr  library  routine,  which 
writes  the  passed  value  into  the  specified  slot  after  checking  if  any  continuations  are  waiting. 
The  move-remote  instruction  is  typically  used  for  passing  arguments  and  return  values. 
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(:test-1  (;frame  (:base  6)  :suspensive) 
(:frame  (:base  8))) 


O 


(suspensive-instruction) 
(suspensive-operand  (:frame  (:base  6))) 
(suspensive-check-done) 

(move  8  (:j-register  R1)) 

(c^l  lookup-vector) 

(move  true  (‘.frame  (:base  8))) 


Figure  3-7:  The  Hybrid-to-Complex-J  Conversion  of  a  Test-1.  Despite  the  template’s  appar¬ 
ently  ignoring  the  source,  the  instruction  is  converted  correctly.  Before  the  template  is  even 
considered,  code  is  emitted  to  check  for  the  suspensive  operand. 


3.3.5  Test  Instructions 

The  hybrid  architecture  includes  the  test-1  and  test-2  instructions  to  write  true  into  the 

destination  if  the  source(s)  are  present.  Execution  should  suspend  if  any  source  is  unavailable. 

The  template  for  test-1  is  simply: 

(del conversion  tstl  :test-l  (si  dest) 

(append  (lookup-into  dest) 

'((move  ( :tagged>litexal  , boolean-tag  1)  ,dest)))) 

The  transformation  for  test-2  is  identical.  The  simplicity  lies  in  how  the  converter  handles 
suspensive  suguments:  Before  the  template  stage  is  even  reached,  code  will  have  been  emitted 
to  check  suspensive  operands  and  to  suspend  if  they  are  not  present.  Figure  3-7  shows  the 
conversion  of  a  test-1  instruction. 


3.3.6  Continuation  Instructions 

Two  hybrid  instructions  exist  to  fork  continuations.  They  are  used  to  start  SQs  within  a 

codeblock.  The  template  for  the  continue  instruction  is: 

(delconversion  cntn  : continue  (cont) 

'((sendO  (:j-register  IBR)) 

;  Convert  it  from  (:literal  (:syinbol  :SQ-1))  to  (:rel  :S(3-1) 

(sendO  (:ref  .(second  (second  cont)))) 

(sendeO  (:j-register  A2)))) 

This  sends  a  message  from  a  processor  to  itself  (the  NNR  register  holds  a  processor’s  own 
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node  number),  along  with  the  specified  SQ  bjise  and  the  current  frame  pointer,  kept  in  A2. 

While  the  continue  instruction  is  sufficient,  it  is  non-optimal,  in  that  the  new  continuation 
is  likely  to  immediately  suspend  on  the  first  value  it  checks  for.  With  this  observation, 
lannucci  designed  the  continue-test  instruction  which  tests  the  first  slot  accessed  by  the  new 
SQ.  It  the  value  is  there,  the  continuation  is  forked  as  above;  otherwise,  a  local  continuation 
is  immediately  created  and  stored  in  the  appropriate  slot.  This  saves  a  message  send  in  the 
worst  —  and  most  common  —  case.  The  conversion  template  is: 

(d«f conversion  cntt  : continue-test  (check-slot  cont) 

;  Convert  it  from  (:literal  (:symbol  :SQ-1))  to  (:rel  :Sq-l) 

'((move  (:Tef  .(second  (second  cont)))  (:j-register  RO)) 

(move  (:literal  , (irame-base-offset  check-slot))  (:j-register  Rl)) 

(call  (:literal  , cntt -vector)))) 


This  cal’s  a  local  library  routine,  cntt,  that  does  the  check  and,  depending  on  whether  or  not 
the  data  is  present,  either  sends  the  message  or  stores  the  continuation.  The  cntt  routine 
expects  RO  to  hold  the  SQ  address  and  Rl  to  hold  the  number  of  the  needed  slot. 


3.3.7  Procedure  Linkage  Instructions 

The  procedure  linkage  convention  was  described  in  great  detml  in  Sections  2.3.2  and  3.1.2. 

Briefly,  there  are  three  steps  to  spawning  a  procedure: 

1.  Initiate  a  get-context  request,  sending  the  codeblock  descriptor  eind  the  address  of  where 
to  write  the  new  context  pointer. 

2.  Use  index-current-context  to  create  a  new  global  address  for  return  values  to  be  sent  to. 
For  example,  if  the  first  return  vjilue  should  be  sent  to  slot  8,  index  the  current  context 
by  8. 

3.  Perform  remote  moves  to  transfer  the  indexed  context  and  the  argiiments  into  the  newly- 
allocated  frame. 

The  third  step  uses  the  move-remote  instruction  described  earlier.  The  trcinsformations  for 

get-context  and  inder -current-context  for  the  first  two  steps  are  described  here. 
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Get-Context  The  transformation  for  the  get-coniext  instruction  appears  in  Figure  3-8. 
Rather  than  try  to  explain  it  here,  I  have  added  detailed  comments  to  the  code.  As  mentioned 
earlier,  no  attempt  at  load  balaincing  is  made  by  the  compiler.  A  library  routine,  get-context^ 
resides  on  every  processor  to  use  the  information  sent  and  to  perform  the  cadlee’s  half  of  the 
protocol. 


Index-Current-Context  The  Index-Current-Context  instruction  is  slightly  more  compli¬ 
cated.  By  convention,  the  n  return  values  of  a  procedure  are  sent  to  the  first  n  slots  of  the 
calling  frame.  Because  we  really  never  want  the  return  Vcdues  sent  to  the  start  of  the  cur¬ 
rent  frame,  we  increment  the  current  context  and  send  that  vedue  to  the  caUee  instead.  The 
template  is  shown  in  Figure  3-9. 


3.3.8  Conclusion 


In  the  convert-hybrid-to-cj  stage  of  the  compiler,  hybrid  instructions  are  transformed  into  com¬ 
plex  J-Machine  code.  The  transformations  ignore  the  intricacies  of  MDP  addressing  modes, 
making  the  transformation  process  simpler  and  more  conceptued.  Several  pseudo-operators 
for  handling  suspensive  instructions  md  register  allocation  Jire  used. 
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(delconY«rsion  getc  : get-context  (context-slot  return-slot) 

; ;  The  first  scratch  register  sill  be  used  to  hold  the  global 
;;  frame  descriptor  of  the  calling  frame,  so  that  the  callee 
;;  knous  where  to  send  the  context  value  back  to.  Recall  that 
; ;  the  format  of  a  FD  is  that  the  local  address  is  in  the  high 
;;  sixteen  bits,  and  the  node  number  is  in  the  low  sixteen. 

'((reserve  (: register  scratch)) 

;  Take  the  local  address  of  the  current  frame  from  12. 

(move  (:j-register  12)  (: register  scratch)) 

;  Tag  it  as  an  integer  (instead  of  an  address)  so  we  can  mimge  it. 

(wtag  (:register  scratch) 

(: literal  ,int-tag) 

(: register  scratch)) 

:  Shift  it  over  16,  to  fit  into  FD  format. 

(Ish  (:register  scratch) 

(; literal  ,(-  16  *sys-len-bits*)) 

(: register  scratch)) 

;  Idd  in  the  local  node  number  (i.e.  put  it  in  low  16  bits). 

(add  (: register  scratch)  (:j-register  HSR)  (: register  scratch)) 

;  Tag  it  as  a  FD. 

(wtag  (: register  scratch)  (: literal  ,fd-tag)  (: register  scratch)) 

(sendO  (: literal  1))  ;  Send  to  node  1  always 

(sendO  (:ref  local.getc))  ;  Handler  is  the  local.getc  lib  routine 

(sendO  , context-slot)  ;  Send  the  codeblock  descriptor. 

(sendO  (:register  scratch))  ;  Send  the  current  FD,  so  it  knows 

(free  (: register  scratch))  ;  where  to  send  the  context  back  to. 

(sendeO  , (frame-base-offset  return-slot))))  ;  Send  the  return  offset. 

Figure  3-8:  Transformation  the  Get-Context  Instruction  to  MDP  Code.  The  purpose  of  the 
get-context  instruction  is  to  send  off  a  request  to  allocate  a  context  cuid  return  its  value. 
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(def conversion  ixcc  .‘index -current -context  (freune-base  dest) 
(append  (lookup-into  dest) 

;  i  scratch  register  is  needed 
'((reserve  (: register  scratch)) 

;  Move  the  local  frame  address  into  the  scratch  register 
(move  (:j-regi8ter  i2)  (: register  scratch)) 

;  Tag  it  as  an  integer  so  ee  can  adjust  it 
(vtag  (:register  scratch) 

(: literal  ,int-tag) 

(: register  scratch)) 

;  Add  in  the  new  base,  shifted  over  into  the  address 
:  portion  of  the  instruction 
(add  (: register  scratch) 

(: literal  ,(*  (literal -base-off set  frame-base) 

(expt  2  *sys-len-bits*))) 

(: register  scratch)) 

;  Shift  the  sum  into  the  top  half  of  the  word 
(Ish  (: register  scratch) 

(: literal  ,(-  16  *sys-len-bits*)) 

(: register  scratch)) 

:  Add  the  local  node  number  into  the  low  half  of  the  word 
(add  (: register  scratch) 

(:j-register  HBR) 

(: register  scratch)) 

;  Tag  it  as  a  frame  descriptor 
(wtag  (: register  scratch) 

(:literal  ,fd-tag) 

(: register  scratch)) 

;  Hove  it  into  the  specified  destination. 

(move  (:register  scratch)  ,dest) 

:  Free  the  scratch  register. 

(free  (: register  scratch))))) 


Figure  3-9:  Transformation  of  Index-Current-Context.  The  purpose  of  index-current-context 
is  to  take  the  address  of  the  current  frame,  conceptually  add  a  constant  offset  to  it,  and 
convert  it  to  file  descriptor  format.  It  can  then  be  sent  to  a  spawned  procedure  as  the  frame 
to  return  results  to. 
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3,4  Convert  Complex  J  to  Simple  J 

This  section  is  the  most  complex  of  the  new  modules.  Its  tasks  include: 

1.  Converting  literal  operands  into  tagged  literals. 

2.  Converting  the  suspensive-instruction,  suspensive-operand,  and  suspensive-check-done 
pseudo-ops  into  MDP  code. 

3.  Allocating  and  substituting  MDP  registers  where  they  were  requested  with  the  reserve 
and  free  pseudo-ops. 

4.  Adjusting  instructions  to  use  legal  MDP  addressing  modes. 

We  will  examine  each  of  these  stages. 

3.4.1  Converting  Literals  to  Tagged  Literals 

Because  all  values  on  the  MDP  are  tagged,  references  to  literals  must  be  chcinged  to  tagged 
literals.  The  integer  literal  operands  from  the  addition  excunple  in  Figure  3-5  would  both  be 
converted: 

7  -♦  (: tagged-literal  int-tag  7) 

(: literal  (: integer  1))  — »  (: tagged-literal  int-tag  1) 

Booleans  and  labels  are  similarly  transformed. 

The  other  type  of  “literal”  used  is  a  reference  —  a  constant  whose  vadue  is  determined 
at  assemble-time  [Horwat  and  Totty  1987,  page  9].  References  are  used  to  denote  codeblock 
pointer  values,  addresses  of  suspensive  instructions,  and  brainch  destinations.  These  aire  de¬ 
noted  with  the  imaginary  tag  name,  “special-tag”.  These  operands  are  converted  to  MDP 
reference  format  in  the  last  stage  of  the  compiler. 

3.4.2  Generating  Suspensive  Code 

Before  a  suspensive  instruction,  several  things  must  be  done  to  ensure  proper  behavior: 

1.  Store  the  current  instruction  pointer  location  into  RO,  so  if  a  fault  occurs,  the  handler 
will  know  where  execution  shovdd  resume. 
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(suspensive-instruction) 

(suspensive-operand  (:frame  (:base  6))) 
(suspensive-check-done) 

(label  (:tagged-literal  special-tag  (:label  suspensivelQ))) 
(dc  (:tagged-literal  special-tag  :suspensivel9)) 

(move  (.-message  (;base  1))  (:j-register  A2)) 

(rtag  (:frame  (:base  6))  (:j-register  R3)) 


Figure  3-10:  Intermediate  Code  Produced  for  Suspensive  Pseudo-Operands.  The  DC  (“data 
constant”)  instruction  loads  its  assemble-time  constant  operand  into  RO.  If  the  rtag  (“read 
tag”)  instruction  faults,  the  handler  can  use  the  RO  value  to  know  where  execution  should 
restart,  as  described  in  Section  2.2.4. 


2.  Because  execution  could  be  restuned  here,  SQ  setup  code  must  be  emitted  to  load 
the  base  of  frame  address  into  MDP  register  A2,  i.e.  (move  (:message  (:base  1)) 
(:j-register  A2)). 

3.  Check  whether  each  suspensive  operand  is  present,  faulting  if  not. 


For  the  rationale  behind  these  rules,  refer  back  to  Section  2.2.4,  where  the  continuation  format 
was  described.  Figure  3-10  shows  the  conversion  of  the  suspensive  pseudo-ops  in  the  add 
instruction  introduced  in  Figure  3-5,  First,  a  unique  label,  created  with  the  Lisp  procedure 
gensym,  is  emitted.  A  reference  to  it  is  loaded  into  RO  with  the  DC  (“data  constant”) 
instruction.  The  frame  base  is  loaded  into  A2,  after  which  the  tag  of  the  suspensive  operand 
is  read.  If  it  faults,  the  run-time  handler  described  in  Section  2.2.4  will  set  up  a  continuation. 

Although  it  would  be  more  efficient  not  to  explicitly  read  the  tags  of  the  suspensive 
operands,  it  is  necessary  if  the  hybrid  instruction  has  side  effects.  For  example,  a  desti¬ 
nation  might  be  written  or  a  message  might  be  sent  before  a  specific  suspensive  operand  was 
accessed.  A  later  version  of  this  compiler  would  optimize  out  the  “read  tag”  instructions  in 
cases  where  the  explicit  check  would  suffice. 
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3.4.3  Allocating  MDP  Registers 

MDP  registers  have  two  uses:  passing  arguments  to  system  calls  and  holding  temporary  vedues 
within  hybrid  instructions.  When  used  for  system  calls,  they  are  explicitly  referred  to  as  in 
Figure  3-5  earlier.  When  they  are  used  as  temporaries,  generally  it  does  not  matter  which  of 
the  four  MDP  general-purpose  registers  is  used.  The  reserve  and  free  pseudo-ops  generated 
by  the  templates  in  convert-hybrid-to-cj  are  used  to  create  and  destroy  bindings  of  symbols 
to  MDP  registers.  For  example, 

(reserve  (: register  scratch)) 

binds  scratch  to  a  free  MDP  register.  Until  a 

(free  (:register  scratch)) 

is  encountered,  aU  occurrences  of  (:  register  scratch)  jire  converted  to  (;j-register  Rn), 
where  n  is  the  register  bound  to  scratch.  Because  no  more  than  four  temporary  registers  are 
ever  needed,  no  spilling  needs  to  be  done. 

The  only  conflict  arises  because  RO  is  different  from  the  other  GPRs.  The  MDP  instruction 
Dt'' loads  a  32-bit  quantity  into  RO.^  Except  for  a  few  special  values,  only  7-bit  quantities  can 
be  specified  as  constants  to  move  directly  into  the  other  registers.  Thus  there  is  an  internal 
compiler  routine,  request- appropriate-register  that  takes  an  argument  specifying  what  will  go 
in  the  register  and  rettims  a  binding  to  an  appropriate  register  —  i.e.  RO  if  the  argument  is 
a  big  value,  another  register  otherwise.  If  RO  has  already  been  allocated,  an  instruction  to 
move  the  old  contents  of  RO  into  another  register  is  generated,  and  the  previous  binding  to 
RO  is  changed.  This  process  is  illustrated  in  Figure  3-11.^ 

3.4.4  Converting  to  Legal  MDP  Operands 

Instructions  on  the  MDP  are  only  17  bits  long.  While  this  permits  tight  packing  and  quick 
loading,  it  limits  the  operand  space.  Specifically,  general-purpose  registers  are  required  as 

*DC  is  more  accurately  an  assembler  pseudo-op.  It  must  have  a  constant  value  for  its  operand  which  is 
then  put  directly  into  the  instruction  stream.  During  execution,  if  the  instruction  pointer  is  at  something  that 
is  not  tagged  instruction,  it  is  loaded  into  RO.  This  allows  32-bit  values  to  be  directly  loaded  into  a  register, 
despite  the  normal  17-bit  instruction  length. 

*Nate  Osgood  helped  me  develop  this  one-pass  register  allocation  scheme. 
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Request 


Bindings  Code  Emitted 


(request-appropriate-register  1 00) 
reg91 

reg91  ->  RO 

(request-appropriate-register  1 1 ) 
reg92 

reg91  ->  RO 
reg92  ->  R3 

(request-appropriate-register  500) 
reg93 

reg91  ->  R2 
reg92  ->  R3 
reg93  ->  RO 

(move  (:i-register  RO) 
(;j-register  R2)) 

Figiixe  3-11:  Compiler  Register  Allocation.  Requests  for  registers  and  the  return  values 
are  shown  in  the  leftmost  column.  The  binding  names  are  generated  by  the  Lisp  gensym 
procedure.  The  middle  column  shows  the  internal  set  of  bindings  after  each  instruction.  A 
conflict  arises  on  the  third  request  where  RO  is  needed  but  is  already  part  of  another  binding, 
regOl.  The  register  allocator  emits  code  to  move  whatever  has  been  placed  in  RO  into  a 
previously-free  register,  R2.  The  binding  for  regQl  is  then  changed  to  R2,  and  the  new 
request  can  get  RO. 


operands  to  certain  instructions,  and  only  very  short  constants  can  be  encoded  in  instructions. 

Consider  the  following  hybrid  instruction: 

(:add  (: frame  (:base  6)) 

(: literal  (: integer  30)) 

(:frame  (:base  7))) 


There  are  two  reasons  why  it  cannot  be  encoded  into  one  MDP  three-operand  instruction: 

1.  The  first  €ind  last  operands  must  be  general-purpose  registers. 

2.  If  the  second  operand  is  a  constaint,  it  must  be  In  the  reinge  [15. ..-16]. 

The  above  add  instruction  would  be  translated  into  four  MDP  instructions; 

(mova  (:frama  (:base  6)) 

(:j-ragister  R3)) 

(mova  C :taggad-litaral  int  30) 

(:j-ragister  R2)) 

(add  (:j-ragi6ter  R3) 

(:j-ragister  R2) 
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(:j-r6gister  R3)) 
(move  (;j-r«gi8ter  R3) 

(:lrame  (:base  7))) 


The  astute  reader  will  have  observed  that  if  the  order  of  the  source  operands  were  changed, 
they  could  be  encoded  into  one  less  MDP  instruction.  I  did  not  have  time  to  incorporate  this 
optimization  for  commutative  instructions. 

As  another  example,  consider  a  hybrid  instruction  to  move  an  immediate  into  a  frame  slot: 

(:move  (: literal  (: integer  600)) 

(:fraae  (:base  20))) 

Because  500  is  more  than  seven  bits  long,  it  must  be  loaded  into  RO  through  the  DC  instruc¬ 
tion: 

(dc  (tagged-literal  int  500)) 

(acre  (;j-register  RO) 

(:fraBe  (:base  20))) 


Like  immediates,  offsets  from  the  frame  base  can  only  be  five  bits  in  three-operand  instructions 
and  seven  bits  in  two-operand  instructions.  If  the  destination  of  the  above  move  had  an  offset 
of  100  insteaa  of  20,  the  code  would  be: 

;  (:Bove  (:literal  (:integer  500))  (:lrame  (:base  100))) 

(dc  (tugged-literal  int  600)) 

(move  (:j-register  RO) 

(:j-register  R3)) 

(dc  (tagged-literal  int  100)) 

(move  (:j-register  R3) 

(:lrame  (:base  (:j-register  RO)))) 


This  illustrates  the  RO  conflict  maneuver  described  in  Section  3.4.3. 


3.5  Convert  Simple  to  ASM 

This  last  stage  converts  the  code  to  a  format  stiitable  for  the  MDP  assembler.  This  involves 
converting  from  s-expressions  into  plain  text  and  translating  the  operands  into  a  suitable 
format.  Offsets  are  converted: 
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(:T«gister  (:base  X))  — »  [I, AO] 

(:lram«  (:base  I})  -*  [I,A2] 

(:message  (:base  I))  [I, A3] 

The  first  transformation  is  to  convert  hybrid  registers  to  temporary  storage.  On  the  J- 
Machine,  accesses  off  of  AO  are  absolute  addresses.  The  first  twenty  words  of  MDP  memory 
are  devoted  to  temporary  storage,  so  hybrid  register  n  is  stored  at  absolute  address  n  on  the 
J-Machine.  As  on  the  hybrid  architecture,  the  value  is  not  guaranteed  to  be  the  same  between 
suspensions. 

Additionally,  assemble-time  references  must  be  output  properly.  When  a  reference  is 
encountered  as  an  operand,  it  is  converted: 

(: tagged-literal  special-tag  X)  — »  {Xjasg_ref} 

Additionally,  X  is  added  to  a  list  of  references.  At  the  end  of  compilation,  for  each  reference 
X  in  the  list,  the  following  is  output: 

T«t  X_msg_ref  =  MSG:  (((X+H_loc)«10)>+2 

where  N  is  the  name  of  the  procedure.  This  creates  a  reference  whose  value  includes  the 
absolute  address  of  the  associated  label  (labels  are  usually  relative  addresses),  as  well  as 
specifying  that  it  will  be  used  as  a  header  of  a  message  with  two  words.  (The  two  words  will 
be  the  message  itself  emd  the  frame  Vcdue.) 

3.6  Conclusion 

In  order  to  convert  hybrid  code  to  MDP  assembly  code,  I  created  intermediate  formats  and 
routines  to  convert  from  more  complex  formats  to  simpler  ones.  These  are  useful  not  only  for 
this  compiler  but  as  a  general-ptirpose  J-Machine  utility.  A  MDP  assembly  coder  or  compiler 
writer  could  produce  complex  J-Machine  code  eind  be  spared  the  trouble  of  remembering  how 
many  bits  of  operand  are  available  for  each  instruction.  While  my  register  allocation  is  stiU 
too  primitive  to  give  optimal  results  —  for  example,  the  same  value  could  be  stored  in  two 
different  registers  —  it  is  still  good  enough  to  provide  a  new  dialect  of  MDP  assembly  language 
that  a  programmer  might  choose  for  its  greater  abstraction  cind  simplicity. 
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Chapter  4 


Analysis 

“j4  slow  sort  of  country!”  said  the  Queen.  “Now,  here,  you  see,  it  takes  all  the 
running  you  can  do,  to  keep  in  the  same  place.  If  you  want  to  go  somewhere  else, 

you  must  run  at  least  twice  as  fast  as  that.  ” 

—  Lewis  Carroll,  Through  the  Looking-Glass. 

I  am  pleased  with  the  system,  in  that  it  works  and  reasonable  solutions  were  found  for 
every  problem.  However,  while  some  of  the  mechanisms  worked  out  well,  not  all  turned  out  to 
be  as  efficient  as  I  would  like.  In  this  chapter,  I  provide  a  detailed  example  of  code  produced 
and  executed  for  an  Id  routine,  several  benchmark  resffits,  and  analysis  of  both  my  system 
and  the  J-Machine. 


4.1  Detc.^led  Benchmark:  Factorial 

In  this  section,  I  will  go  into  great  detail  by  providing  listings  and  statistics  for  a  sample  Id 
procedme.  Specifically,  I  will  describe  the  composition  and  execution  of  the  simple  recursive 
factorial  program  shown  in  Figure  4-1. 

4.1.1  The  Dataflow  Graph 

First,  the  initial  stages  of  the  compiler  convert  the  prograim  into  a  dataflow  graph,  such  as  the 
one  shown  in  Figure  4-2.  I  have  abstracted  away  some  of  the  details  in  order  to  highlight  the 
essential  parts  of  the  graph.  First,  the  input  arrives  at  node  1.  It  is  passed  through  imchcinged 
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fact  n  = 

if  n  <=  1  then 
n 

else 

n  •  fact  (n-1); 


Figure  4-1:  Id  Code  for  Factorial 


Figure  4-2:  A  Dataflow  Graph  for  Factorial.  If  an  integer  n  is  input  to  the  top  identity  node, 
n!  will  be  computed.  The  switch  node  uses  its  left  input  as  a  control  signal  amd  its  right  input 
as  data.  If  the  control  signal  is  true,  data  goes  to  the  left  output  Eire;  otherwise,  to  the  right. 
The  identity  node  copies  its  inputs  to  its  output  arcs.  The  dotted  line  from  the  call  node 
to  the  mid  node  indicates  that  the  cormection  is  indirect.  The  numbers  are  for  expository 
purpose  only. 
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by  the  identity  instruction  to  nodes  2  and  3.  Node  2,  the  predicate,  passes  a  boole£in  vedue  to 
node  3,  a  switch  instruction.  The  semantics  of  the  switch  instruction  are  such  that  it  passes 
its  data  input  to  the  left  output  arc  if  the  control  input  is  true,  and  to  the  right  arc  if  the 
control  input  is  fedse.  Thus,  if  the  predicate  is  true  —  i.e.  if  the  argument  is  less  than  or 
equal  to  one  —  the  argument  itself  will  be  sent  to  node  9  and  returned.  In  the  inductive  case, 
the  argument  is  sent  to  identity  node  4.  Node  7,  ccdl  fact,  makes  the  recursive  call,  specifying 
that  the  return  value  should  be  sent  to  node  8,  muL  When  it  arrives,  the  multiplication  is 
performed,  and  a  value  is  sent  to  node  9  to  be  returned. 

The  purpose  of  showing  and  describing  the  graph  is  to  give  an  idea  of  how  the  compiler 
looks  at  a  procedvire.  laimucci’s  stages  of  the  compiler  can  only  see  the  dataflow  graph,  not 
the  source  code. 


4.1.2  The  Hybrid  Code 


The  hybrid  code  produced  by  the  factorial  example  is  shown  in  Figure  4-3.  I  have  added 
comments,  lines  headed  with  semicolons,  to  describe  the  process.  Readers  uninterested  in 
such  technical  detml  should  skip  to  Figure  4-4  which  shows  the  SQs’  composition  at  a  higher 
level.  Figure  4-5  shows  frame  usage. 
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»  f  I 


SQ-1  do«a  initialization,  forks  local  SQs,  and  triss  to  xotnm 
; ; ;  tha  rsanlt . 

((•libel  (:LITEEiL  (:STMB0L  :Sq-l)))) 

; ;  Pnt  tho  codoblock  pointer  to  FACT  into  £12] . 

((:N0TE  (tLZTBRAL  (:C0DE-BL0CK  :FiCT))  (:FRANB  (:BiSB  12)))) 

::  Fork  Sq  2,  innadiatoly  anspanding  it  if  [3],  a,  ia  aaptjr. 

((:COmHTlB-TBST  (:FE1MB  (:BiSB  3)  : SUSPENSIVE)  (:LITE&iL  (:STMB0L  :Sq-2)))) 

Fork  Sq  11,  innadiataly  snapanding  it  if  COj,  tha  ratnm  location, 

;}  ia  aa^ty. 

((:COHTINTIE-TEST  (tFEAME  (:BiSE  0)  :SUSPEHSIVE)  (:LITER1L  (tSTMBOL  :Sq-ll)))) 

((:L1BEL  (tLITERjU.  (:STKB0L  :SEND-RESlILT-0)))) 

Paaa  [6],  tha  ratnm  valna,  np  to  offset  1  from  tha  calling  frame. 
((:MaVE-R£M0TE  (:FI11NE  (:BiSE  0)  : SUSPENSIVE) 

(.•LITERAL  (: INTEGER  1)) 

(:FRAME  (:BASE  6)  .-SUSPENSIVE))) 

;;  Pass  [S]  ,  a  signal  (“tma"),  np  to  offset  0  from  tha  calling  frame. 
((:M0VE-REM0TE  (:FRAME  (:BASE  0)) 

(:LITERAL  (: INTEGER  0)) 

(:FRAHE  (;BASB  S)  : SUSPENSIVE) ) ) 

((:TERMIRATE)) 


Sq-11  seta  [S] ,  tha  signal,  ahan  locations  [O]  and  [7]  have  data. 
((:LABEL  (:LITERAL  (;STKBQL  :Sq-ll)))) 

((:TEST-2  (:FRAME  (:BASE  0)  ; SUSPENSIVE) 

(.-FRAME  (.-BASE  7)  .-SUSPENSIVE) 

(:FRAME  (:BASE  5)))) 

((:TERMINATE)) 

;;;  Sq-2  STalnatss  tha  predicate  and  mns  appropriate  coda. 

((:LABEL  (rLITERAL  (:STMB0L  :Sq-2)))) 

: ;  Pnt  in  [4]  tho  roanlt  of  chocking  if  [3] ,  tha  argnment ,  is  <•  1 . 
((:<-  (:FRAME  (:BASE  3)  : SUSPENSIVE) 

(rLITERAL  (:INTEGER  1)) 

(:FRAMB  (;BASE  4)))) 

: ;  If  not ,  branch  to  ELSE-4 . 

((:BRAVCH-FALSE  (:FRAHB  (:BASE4))  (;LITERAL  (:STKB0L  :ELSE-4)))) 

;;  Copy  [3],  tho  argnment,  into  [6],  tho  slot  for  tho  result. 
((:M0VE-I0ENTITT  (.-FRAME  (.-BASE  3)  .-SUSPENSIVE)  (:FRAME  (:BASE  6)))) 

; ;  Copy  [4]  ,  tha  predicate  rosnlt  ("tma")  into  [7]  . 

((:N0VE-I0ENTITT  (:FRAME  (:BASS  4))  (:FRAME  (.-BASE  7)))) 

;;  Branch  past  indnetiva  case  coda. 

((:BRANCH  (iLITERAL  (:3TMB0L  :END-TF-4) ) ) ) 


;;  Continutd  on  next  page. 
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;;  Continuing  from  previous  page. 


;;  This  cod*  g*ta  •zacntad  for  th*  inductlT*  cos*. 

((:LABEL  (-.LITSaAL  (:STMB0L  :£I.SB-4)))) 

; ;  Subtract  1  from  [3] ,  th*  argnmaut ,  and  pat  th*  rasnlt  in  [14] . 

((:-  (:FaiME  (:BiSB  3)  :SaSPENSZTE) 

(tLITBRlt  (.-INTEGEB  1)) 

(iFRANB  (:BASB  14)))) 

; :  Spam  th*  codablock  «hoa*  nam*  ia  in  [12]  (fact) ,  patting  th*  contazt 
; ;  Tain*  into  [10] . 

((:GBT-C0MTEXT  (:FR1HE  (:BiSB  12))  (:FRiKB  (:BiSB  10)))) 
i ;  Spacif y  [8]  a*  th*  baa*  location  for  ratnm  Tain**  for  th* 

; ;  apanad  procadnr* . 

((:IlDBZ-CTmB£BT-COHTEZT  (:LITEBAL  (:BiSB  8))  (:B£GISTBB  0))) 

Sand  thia  adjnatad  contaxt  (i.a.  th*  ratnzn  location)  to  alot  zaro 
; ;  ia  th*  apawnad  procadnx* . 

((:M0TB-REM0TE  (:FRAME  (:B1SB  10)) 

(aiTBRAL  (iIHTECBB  0)) 

(:REGISTEB  0))) 

;;  Sand  [10],  th*  argnmant  minna  1,  to  alot  thro*  in  th*  apawnad  procadnr*. 
((:M07B-REM0TE  <:FBAKB  (:B1SB  10)) 

(:LITBRAL  (iIHTEGER  3)) 

(:FRANB  (:B1SB  14)))) 

;;  Fork  SQ-S,  imnadiataly  anapanding  it  if  [3],  th*  arganiant .  ian’t  har*. 
((zCON-nHUB-TBST  (:FBAME  (:B1SB  3)  iSnSFENSITB)  (:LITERiI.  (iSTMBOL  :Sq-5)))) 
;  i  Fork  SQ-8 ,  iiamadiataly  anapanding  it  if  [8]  ,  th*  aignal  that  tha  apawnad 
procadnr*  i*  dona,  ian’t  har*. 

((:C0HTIHOE-TEST  (:FR1ME  (:BiSE  8)  : SUSPENSIVE)  (.-LITEIUL  (:STMB0L  ;SQ-8)))) 
((:L1BEL  (:LlTEaiL  (;STHB0L  :END-IP-4)))) 

((:TERHINiTB)) 

Sq-8  fraa*  th*  contaxt  of  th*  apawnad  procadnr*  if  it’a  not  naadad 
; :  any  mor* . 

((:LiBEL  (tLITERlL  (iSTHBOL  :Sq-8)))) 

,* ;  Suopond  if  [8] ,  th*  aignal  that  th*  apawnad  procadnr*  i*  don* ,  ia  praaant 
((:TEST-1  (:F&iKE  (:BiSE8)  :SnSPENSIVE)  (sBEGISTEB  0))) 

;;  Batnm  [10],  th*  contaxt  of  th*  apawnad  procadnr*,  writing  tm*  into  [11] 
((:RBTnaN-COHTEXT  (:FRiHE  (:BiSB  10)  ; SUSPENSIVE)  (zFBlHE  (:BiSE  11)))) 

Copy  [11],  th*  tm*  aignal,  into  [7],  a  aignal  that  all  work  ia  don*. 
((:N0VE-I0ERTITT  (;FRAMB  (:B1SE  11))  (:FR1KE  (:B1SB  7)))) 

((:TEBNINiTE)) 

;;  sq-5  ia  apawnad  only  for  th*  indnctiT*  caa*. 

((:LABEL  (:LITERiL  (iSTHBOL  :Sq-S)))) 

; :  Hnltiply  [3]  ,  th*  argnmant ,  by  [9]  ,  th*  rain*  ratnmad  by  th*  racnraiv* 

;;  call,  patting  th*  raanlt  into  [13]. 

((:*  (;FB1KE  (:BiSE3)  :StTSPENSIVE)  (:FBiME  (:BASE9)  :SUSPENSIVE) 

(:FBAME  (:BiSE  13)))) 

;;  Mot*  thin  Tain*  into  [6],  th*  alot  for  th*  ratnm  wain*. 

((:M0VE-IDENTITT  (:FRAME  (:BiSE  13))  (:FR1ME  (:BiSE  6)))) 

<(:TERNIN1TE)) 


Figure  4-3:  Hybrid  Code  for  Factorial. 
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Figure  4-4:  There  are  five  scheduling  quanta  in  factorioL  The  niunbers  in  the  SQ  names  have 
no  significance,  except  that  the  first  is  always  named  SQ-1.  Arrows  indicate  where  SQs  may 
be  forked  and  can  be  thought  of  as  a  subset  of  data  dependences.  SQs  5  and  11  are  only 
spawned  in  the  recursive  case.  Observe  that  on  its  first  execution,  SQ-1  will  fault  midway 
through,  because  the  return  results  will  not  be  ready.  Execution  will  restart  in  the  middle  of 
the  SQ. 
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Base  Case 


Recursive  Case 


FD  of  return  location 

0 

FD  of  return  location 

unused 

1 

unused 

unused 

2 

unused 

argument  (n) 

3 

argument  (n) 

n  <- 1  ? 

4 

n<-  1  ? 

signal,  [0]  &  [7]  full 

5 

signal,  [0]  &  [7]  full 

result 

6 

result 

signal,  [4]  true 

7 

signal,  [11]  full 

8 

signal,  rec  call  done 

9 

result  of  rec.  call 

10: 

context  of  rec.  call 

11: 

signal,  [10]  treed 

12: 

fact_codeblock 

13: 

(3]+[9] 

14; 

n  - 1 

Figure  4-5:  Frame  Slots  Used  by  Factorial  Code.  The  left  frame  shows  slot  usage  in  the  base 
case,  and  the  right  frame  shows  slot  usage  in  the  recursive  case.  Signals  are  flags  that  are  set 
to  indicate  that  the  described  condition  has  been  met;  i.e.  [5]  is  explicitly  set  to  true  after 
values  are  written  to  [0]  and  [7].  Note  that  the  same  slot,  such  as  [7],  can  have  a  different 
meaning  for  the  two  mutually  exclusive  cases. 
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4.1.S  The  MDP  Code 


The  MDP  code  is  included  in  Appendix  A.l.  It  has  all  of  the  same  characteristics  as  the  hybrid 
code,  i.e.  the  same  frame  slot  assignments  and  SQs  (modulo  my  slightly  different  calling 
convention).  The  hybrid  code  had  28  instructions;  the  MDP  code  has  180,  not  counting  code 
in  library  routines.  Thus  there  are  an  average  of  6.4  MDP  instructions  per  hybrid  instruction. 
This  blow-up  is  not  as  bad  as  it  seems  because  a  MDP  instruction  word  is  roughly  one-fourth 
the  size  of  a  hybrid  instruction  word.^  Part  of  the  growth  is  thus  the  accepted  expansion  factor 
between  CISCy  and  RISCy  architectiires.  As  the  reader  will  recall,  there  are  two  reasons  that 
one  hybrid  word  expands  into  many  instruction  words;  First,  hybrid  instructions  are  more 
powerful  and  suited  to  the  special  purpose  than  the  J-Machine;  second,  an  expamsion  occurs 
to  fit  the  code  into  the  more  restrictive  MDP  addressing  modes. 

4.1.4  Load  Balancing 

As  mentioned  in  Section  2.3.2,  my  compiler  does  no  load  balancing.  The  user  must  modify  the 
code  produced  by  the  geUcontext  instruction  to  spawn  procedures  to  an  appropriate  processor, 
usually  a  function  of  the  argument(s);  otherwise,  all  calls  will  go  to  the  same  node.  Because 
factorial  is  singly  recursive,  it  makes  sense  to  spawn  (fact  n)  onto  processor  n,  because  no 
task  win  already  be  running  there.  I  changed  one  line  of  the  compiled  routine  to  implement 
this.  If  n  were  potentially  larger  than  p,  the  number  of  processors,  we  would  take  its  value 
modulo  p.  This  would  guarantee  an  even  distribution. 

4.1.5  Dynamic  Counts 

When  I  ran  (fact  4)  on  the  MDP  simulator,  it  took  1263  ticks  for  the  result  to  be  written 
to  the  originad  calling  frame.  A  tick  is  the  time  unit  used  by  the  simulator:  One  tick  equaJs 
one  instruction,  even  though  not  aU  instructions  on  the  J-Machine  will  take  the  same  time. 
The  simulator  also  ignores  network  latency.  Four  processors  were  enabled,  and  utilization  was 
37%  —  i.e.  on  average,  a  processor  did  useful  work  a  little  over  a  third  of  the  time.  Fault 

*I  cannot  give  an  exact  length  for  hybrid  words,  because  the  compiler  I  used  was  for  a  paper  version  of  the 
architecture  where  word  lengths  were  essentially  unlimited.  According  to  lannucci  in  private  correspondence, 
the  word  size  can  be  roughly  thought  of  as  64  bits. 
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Handler  Name 

Times  Called 

Ticks/Call 

Total  Ticks 

Lookup 

25 

5  or  6  -1-  6tn 

212 

CFUT 

21 

18 

378 

Move- Remote 

16 

13  -1-  7uj 

264 

Continue- Test 

14 

7  or  20 

189 

Get- Context 

3 

24 

72 

Allocate 

3 

12 

36 

Total 

136 

n/a 

1061 

Table  4.1:  System  Calls  for  (fact  4).  Ranges  are  specified  for  the  ticks/caU  column  because 
the  time  may  depend  on  the  data.  For  lookup  and  move-remote^  w  specifies  the  number  of 
waiting  continuations.  An  estimated  average  number  is  used  (with  w  to  approximate 
the  total  ticks. 


Instruction  Type 

Times  Used 

Percent 

Comments 

Move 

882 

47.8 

Both  reg-reg  and  reg-frame 

Field 

247 

13.3 

Operations  on  tags 

Network 

237 

12.8 

Sending  messages 

DC 

159 

8.6 

Loading  constants  into  RO 

Branch 

125 

Does  not  include  busy-looping 

Fault 

87 

Entering  and  leaving  system  calls 

ALU 

61 

ALU  ops  for  program  and  libraries 

NOP 

46 

NOPs  used  as  padding  to  align  instructions 

Table  4.2:  Dynamic  Instruction  Usage  for  (fact  4). 


and  library  usage  is  shown  in  Table  4.1.  As  the  totals  show,  84%  of  the  time  was  spent  in 
the  libraries.  The  routine  that  consumed  the  most  time  was  the  cfuture  fault  handler.  It 
was  called  21  times,  and  each  time  took  18  ticks.  As  described  in  Section  2.2.4,  the  cfuture 
handler  must  allocate  space  to  store  a  continuation  and  fill  in  the  necess^l^y  data.  Dynamic 
instruction  usage  (not  counting  idle  cycles)  is  shown  by  category  in  Table  4.2. 

The  average  number  of  instructions  executed  per  message  is  92.6,  which  is  larger  than 
the  55  instructions  per  message  empirically  found  by  Horwat  in  [Horwat  1989,  page  104].  His 
Concurrent  Smalltalk  version  of  the  same  factorial  progreim  takes  only  315  ticks  to  complete 
[Horwat  1989,  page  110],  compaired  to  my  1263. 
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Argument 

Ticks/Call 

Ticks/Skewed  Calls 

Ticks /Nonskewed  Calls 

4 

1263 

1“  Result  2"^  Result 
1864  2163 

1*‘  Result 
1992 

2^  Result 
2271 

8 

2691 

4204 

4590 

4332 

4611 

12 

4119 

6544 

6846 

6672 

6951 

Table  4.3:  Throughput  for  Factorial.  This  table  compares  the  number  of  ticks  required  to 
compute  one  and  two  calls  of  factorial.  For  each  case,  the  number  of  processors  used  is  the 
same  as  the  argument.  The  first  data  column  shows  how  long  it  takes  for  one  call  executing 
alone  to  complete.  The  second  set  of  colunms  shows  how  long  it  takes  to  complete  two  factorial 
calls  made  at  the  same  time,  skewed  among  the  enabled  processors.  The  last  set  of  columns 
shows  the  completion  times  when  the  two  calls  are  not  skewed  among  the  processors. 


4.1.6  Throughput 

One  reason  for  the  high  latency  is  that,  at  every  design  decision,  throughput  was  favored  over 
latency.  This  is  due  to  the  decision  to  break  apart  any  transaction  of  unboimded  latency,  which 
increased  the  latency  of  tasks  but  improved  throughput.  Table  4.3  shows  that  computing  two 
invocations  of  factorial  concurrently  on  the  J-Machine  takes  significantly  less  than  twice  as 
much  time  as  computing  a  single  call.  This  is  true  for  two  reasons: 

1.  Each  task  suspends  itself  when  it  is  waiting  for  a  resrdt  from  emother  processor. 

2.  The  factoricd  calls  can  be  skewed  among  the  processors. 

The  table  isolates  these  factors  by  including  results  for  when  the  procedure  calls  are  skewed 
and  when  they  are  not.  Even  when  two  factorial  calls  execute  on  the  seime  processors,  in  the 
same  order,  throughput  is  increased  over  the  single  call  case.  This  is  because  subtasks  of  the 
second  factorial  call  can  execute  when  no  work  can  be  done  on  a  given  processor  toward  the 
first  factorial  call. 

4.1.7  Conclusion 

Although  I  was  pleased  that  the  throughput  of  the  system  was  better  than  the  reciprocal 
of  the  latency,  I  was  disappointed  by  the  high  latency,  although  it  was  predictable.  One  of 
my  purposes  in  following  the  factorial  program  through  each  step  was  to  show  where  all  the 
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fib  n  * 

if  &  <s  1  then 
n 

else 

fib  (n-1)  +  fib  (n-2) ; 


Figure  4-6:  Id  Code  for  Fibonacci 


overhead  was  added.  I  should  have  expected  the  traditional  costs  of  simulating  one  archi¬ 
tecture  on  another.  Almost  half  the  time  was  spent  on  the  “cfuture”  and  “lookup”  handlers 
which  store  suspended  continuations  and  revive  them,  respectively.  It  would  be  impossible 
to  simulate  these  features  with  high  efficiency.  The  problem  can  be  summarized  succinctly: 
Because  almost  aU  the  synchronization  is  handled  in  software,  it  is  impractical  to  synchronize 
on  individual  frame  slots.  While  the  costs  incurred  to  synchronize  on  arguments  and  return 
values  would  be  reasonable,  synchronizing  on  temporary  values  is  excessively  expensive.  This 
’s  exacerbated  by  the  hybrid  compiler’s  lavish  creation  of  frame  slots,  which  maike  sense  on 
its  architecture  but  not  when  synchronization  is  done  in  software. 

While  I  was  initially  optimistic  after  the  6.4:1  code  expansion  because  of  the  normal 
CISC/RISC  trade-offs,  we  see  now  that  this  number  is  actually  irreleveuit,  as  the  vast  majority 
of  time  is  spent  in  library  routines  that  the  hybrid  architecture  would  have  in  hardware. 
Simulating  the  hybrid  eirchitecture  was  thus  not  an  optimal  choice  for  implementing  dataflow 
on  the  J-Machine.  A  better  choice  is  described  in  the  next  chapter. 


4.2  Fibonacci 

Another  program  I  benchmarked  was  the  doubly-recursive  Fibonacci  routine  shown  in  Fig¬ 
ure  4-6.  The  corresponding  MDP  code  is  in  Appendix  A. 2.  There  are  46  lines  of  hybrid  code, 
which  translate  to  271  lines  of  MDP  code,  yielding  a  ratio  of  5.9:1.  Because  its  tr6insformation 
is  so  similar  to  factorial’s,  I  will  not  go  into  detail,  except  to  mention  that  I  added  a  distribu¬ 
tion  function  to  load  balance.  Empirically,  I  foimd  the  function  (((pcind  n)-i-(p  or  n))  and  31), 
where  p  is  the  current  processor  number  and  n  the  new  argument.  In  rtms  with  more  than 
one  processor,  I  used  this  function  to  map  cadis  to  processors. 
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Argument 

#  Processors 

Number  of  Ticks 

1 

1 

166 

4 

1 

4353 

4 

6 

2105 

6 

1 

13760 

6 

13 

3473 

8 

22 

5628 

10 

32 

9566 

12 

32 

67641 

Table  4.4:  Timings  for  Fibonacci.  Note  that,  until  the  argument  gets  very  leirge,  the  growth  in 
number  of  ticks  is  not  exponential  when  many  processors  are  used.  Computing  Fib(n)  takes 
roughly  (^'*2^)"  procedure  c£dls,  which  Cein  be  distributed  among  the  processors. 


The  times  cind  statistics  for  execution  with  different  arguments  is  shown  in  Table  4.4.  Note 
that  for  low  arguments  using  multiple  processors,  growth  is  closer  to  linear  than  exponential. 
This  is  illustrated  in  Figure  4-7.^ 

While  the  nvunber  of  ticks  was  higher  th£in  I  would  have  liked  for  Fibonacci,  the  change 
in  its  order  of  growth  was  just  the  sort  of  thing  one  hopes  to  see  on  a  parallel  computer.  I 
was  only  able  to  simulate  32  processors.  The  results  shotild  be  faintastic  when  a  4096-node 
J-Machine  comes  on-line.  _ 


* 

4.3  Loop  Parallelization 

In  the  Fibonacci  example,  the  parallelism  was  due  to  a  function  distribution  strategy  that  I 
added  by  hand,  thus  it  Ccinnot  really  be  counted  as  part  of  the  system.  This  is  in  contrast  to 
loop  parallelization,  for  which  it  is  straiightforward  for  the  compiler  to  provide  parallelism:  K 
there  are  K  iteration  areais,  each  need  only  be  assigned  unique  processors  to  send  sub  calls  to; 
for  example,  iteration  area  i  could  spawn  its  subcalls  to  processors  t,  i  -j-  iif,  etc.  Because  K 
would  be  available  at  run-time  (and  optionjiUy  at  compile- time),  this  could  easily  be  computed. 

Because  the  compiler  did  not  handle  loops,  as  explained  in  Section  2.3.3, 1  compiled  simple 
loop  programs  by  hand  emd  did  not  have  the  time  or  compute-power  for  a  large  example.  The 

^Unfortunately,  I  was  unable  to  get  a  Concurrent  Smalltalk  timing  to  compare  it  to. 
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14000 ^ 


Argument 


Figure  4-7:  Plot  of  Ticks  for  Fibonacci 


j 
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del  loop  n  s 
{  sum  »  0 
in 

{  lor  i  <-  1  to  n  do 

sum_  increment  =  sum  i; 
next  siua  s  sum.  increment 
linally  sum  }}; 


Figure  4-8:  Id  Code  for  Loop  Example 


loop  program  I  used  is  shown  in  Figure  4-8.  The  program  returns  the  sum  of  the  first  n 
integers.® 

The  produced  code  may  be  seen  in  Appendix  A.3.  There  are  48  lines  of  hybrid  code  which 
were  translated  to  188  lines  of  MDP  code,  for  a  3.9:1  ratio.  The  better  instruction  ratio  is 
due  to  my  hand- compiling  the  code  rather  than  using  my  non-optimizing  compiler.  While  I 
purposely  did  not  generate  top-quality  code,  I  still  used  better  register  allocation  than  the 
compiler,  saving  reloads.  Another  factor  was  the  reliaince  on  additional  library  routines. 

This  program  is  a  useful  benchmark  in  that  it  shows  the  overhead  to  set  up  iteration  areas 
and  to  laimch  iterations  of  a  loop  in  parallel.  The  number  of  ticks,  as  a  fimction  of  K,  the 
nrimber  of  itera.  ions  to  unroll,  and  n,  the  argument,  is  50  -|-  5  *  IT  -1- 135  *  n.  The  three  addends 
of  the  formula  can  be  interpreted: 

1.  The  constant  term,  50,  indicates  that  the  additional  cost  for  a  procedure  to  use  loop 
parallelization  is  low.  There  is  thus  no  inhibition  against  parallelizing  loops. 

2.  The  5  *  K  term  is  a  pleasant  surprise:  Once  the  base  cost  for  loop  parallelization  has 
been  paid,  it  only  costs  5  ticks  to  add  and  support  each  iteration  aie&.  This  makes  it 
reasonable  to  unroll  many  iterations  of  a  loop. 

3.  The  135  *  n  term  shows  that  each  dynamic  iteration  of  the  loop  is  costly.  However,  this 
also  can  be  thought  of  in  terms  of  constiint  overhead:  If  each  iteration  of  the  loop  spawns 
a  long  subroutine,  as  in  the  exaimple  in  Figure  2-13,  the  only  additional  code  that  will 

*The  body  of  the  loop  could  have  been  written  more  succinctly  as  next  sum  =  sum  +  i.  I  work  with  this 
version  because  the  hacked-up  version  of  fannucci’s  compiler  could  not  compile  new  loop  progrnms,  and  the 
hybrid  code  for  this  example  was  the  only  available. 
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execute  on  the  loop  processor  is  that  to  spawn  a  procedure  call.  This  means  that  each 
iteration  of  the  loop  will  use  fewer  than  200  ticks  on  its  home  processor,  regardless  of 
how  big  a  computation  it  performs.  As  described  above,  it  is  trivial  to  distribute  its 
procedure  calls  so  that  they  do  not  interfere  with  those  of  other  iterations. 

I  thus  consider  the  loop  parallelization  strategy  a  success,  although  I  am  stiU  dissatisfied 
with  the  overhead.  A  primary  reason  for  the  high  overhead  is  the  small  number  of  registers 
on  the  MDP.  There  are  only  four  general-purpose  registers  amd  four  address  registers.  Two 
of  the  address  registers  are  special-purpose  and  cannot  be  used  by  my  system,  and  one,  A2, 
is  dedicated  to  holding  the  base  of  frame.  This  only  leaves  one  address  register,  Al,  to  use 
as  an  iteration  area  pointer,  which  is  inadequate.  Because  hybrid  addressing  modes  exist  to 
directly  access  slots  of  the  previous,  current,  and  next  iteration  areas,  as  well  as  offsets  from 
the  current  frame,  it  would  be  useful  to  have  spare  address  registers  for  each  of  these  pointers. 
As  things  are  now,  the  value  in  Al  keeps  getting  clobbered  as  references  cire  made  to  the  other 
iteration  areas,  requiring  the  addresses  to  be  recomputed  frequently.  Because  of  the  shortage 
of  general-purpose  registers,  I  cannot  use  them  to  cache  frequently-needed  values. 

4.4  Conclusion 

For  simple  programs  like  factorial  and  Fibonacci,  the  code  performed  several  times  worse 
than  Concurrent  Smalltalk  code.  While  this  is  disappointing,  it  is  to  be  expected,  as  one 
architecture  is  being  used  to  simulate  cinother.  Loop  pcuallelization  provided  very  promising 
results,  particularly  because  the  semantics  of  Id  and  the  state  of  its  compiler  are  such  the 
programmer  need  never  be  aware  of  possible  paradlelization.  Any  gaiin  in  parallelization  and 
efficiency  that  occurs  without  any  programmer  effort  is  a  big  win. 
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Chapter  5 


Conclusion 

And  oftentimes,  to  win  us  to  our  harm, 

The  instruments  of  darkness  tell  us  truths, 

Win  us  with  honest  trifles,  to  betray  *s 
In  deepest  consequence. 

William  Shakespeare,  Macbeth,  Act  I,  Scene  iii,  line  123. 

The  current  system  has  several  strengths  and  weaknesses.  I  consider  its  priniary  strengths 
to  be: 

•  It  successfully  simulates  the  hybrid  architecture  within  an  acceptable  factor  of  code 
expansion. 

•  It  includes  a  powerful  loop  parallelization  strategy  that  shows  the  feasibility  of  concur¬ 
rent  execution  of  iterations  of  a  loop. 

•  The  observed  throughput  of  the  system  implies  that  it  succeeds  to  some  extent  at  latency 
toleration  —  something  more  important  in  real  systems  eind  big  programs  than  in  toy 
benchmarks. 

•  By  taking  advantage  of  the  Id  language  and  compiler,  it  is  possible  for  to  write  pairaUel 
programs  for  the  J-Machine  without  explicit  mention  of  parallelism. 

The  only  disappointment  is  that  the  costs  of  going  through  the  hybrid  architecture  may 
outweigh  the  benefits.  There  are  three  incremented  approaches  that  can  be  tadeen  in  future 
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efforts:  improving  MDP  code  generation,  improving  hybrid  code  generation,  and  eliminating 
weaknesses  of  the  J-Machine.  I  discuss  each  of  these  and  then  propose  taking  a  different 
approach. 


5.1  Improving  MDP  Code 

As  mentioned  in  appropriate  sections  throughout  the  document,  the  MDP  code  I  produce  is 
not  optimal.  Specifically,  register  assignment  is  primitive,  and  various  peephole  optimizations 
could  be  performed.  In  contrast,  the  libraries  (see  Appendix  B)  are  tightly  haind-coded,  as 
I  wrote  them  directly  in  MDP  assembly  language.  Because  roughly  80%  of  execution  time 
is  spent  in  the  libraries,  local  optimizations  of  compiled  code  are  unimportant.  Even  if  I 
could  double  the  speed  of  the  compiled  code  produced,  the  total  execution  time  would  only 
increase  by  10%.  Therefore,  it  is  not  feasible  to  drastically  improve  the  code  through  local 
optimization. 


5.2  Improving  Hybrid  Code 

Op*i' problem  with  my  system  is  that  the  hybrid  code  I  begin  with  is  non-optimal,  particu- 
leirly  in  terms  of  the  J-Machine,  where  cfutvue  faults,  lookup  calls,  etc.,  are  costly.  I  think 
op‘  imizations  to  the  hybrid  compiler  would  go  much  further  than  ones  in  my  back-end  for  the 
MDP.  For  several  reasons,  however,  it  seems  that  modifying  the  hybrid  compiler  would  be  a 
pc  >r  idea: 

1.  The  hybrid  compiler  does  not  fit  quite  properly  on  top  of  the  current  version  of  the  base 
Id  compiler,  and  work  would  be  required  to  bring  them  into  synch. 

2.  Particularly  because  the  code  was  written  by  someone  else,  writing  new  code  might  be 
easier  than  modifying  it.  This  is  me^lnt  not  to  criticize  lannucci’s  excellent  and  very 
readable  coding  style  but  as  a  generjil  comment  on  the  difficulty  of  one  programmer’s 
modifying  another’s  code. 

3.  If  extensive  modification  or  a  re-write  is  necessary,  there  is  no  reason  for  the  extra  costs 
added  by  going  through  an  intermediate  zuchitecture. 
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Because  the  hybrid  architecture  is  too  different  for  the  J-Machine  to  execute  its  code  as 
efficiently  as  code  generated  specifically  for  the  J-Machine,  it  makes  little  sense  to  put  effort 
into  generating  hybrid  code  that  would  be  better  for  the  J-Machine. 


5.3  Strengths  and  Weaknesses  of  the  J-Machine 

Several  features  of  the  J-Machine  make  it  excellent  for  running  dataflow  code;  it  was  designed 
to  support  fine-grained  computation  as  described  in  [Dally  1988a].  The  features  not  foimd  on 
most  computers  that  proved  most  beneficicd  were: 

1.  Hardware  support  for  cfutures. 

2.  The  low-latency  network  which  gives  the  freedom  to  send  frequent  messages  encouraging 
the  division  of  tasks. 

3.  User-defined  tag  t3rpes,  which  aided  debugging. 

4.  The  large  number  of  processors  that  will  be  avadlable. 

There  were  some  things  I  did  not  like  about  the  J-Machine.  Suggestions  for  alleviating  two  of 
the  worst  problems  are: 

1.  Increase  the  number  of  address  and  general-purpose  registers.  Four  of  each,  particularly 
when  some  have  special  purposes,  is  inadequate,  as  described  in  Section  4.3. 

2.  Hardware  support  for  cfuture  suspensions  would  make  frequent  synchronization  much 
more  affordable.  The  18  ticks  for  each  call  of  the  cfuture  fault  hcindler  is  too  expensive. 

At  a  recent  Concurrent  VLSI  Architecture  group  meeting,  I  was  pleased  to  find  that  others  felt 
the  same  needs  and  that  such  changes  might  be  made  for  the  next  version  of  the  J-Machine. 

In  several  instances,  however,  of  imperfect  fits  between  hybrid  code  cind  the  J-Machine,  it 
is  impossible  to  blame  either  architecture.  From  this  observation  and  the  above  descriptions 
of  rejected  ideas  for  incremental  changes,  I  would  like  to  propose  a  different  approach  that 
does  not  rely  on  trying  to  fit  the  two  architectures  together. 
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5.4  Synchronization  on  Tokens 


After  reading  this  document  bcised  on  Traub  and  lannucci’s  method  of  partiadly  sequentializing 
dataflow  programs,  it  is  difficult  to  step  back  and  imagine  a  different  method  that  does 
not  use  frames  and  continuation  lists.  Such  a  method  exists,  based  more  directly  on  Greg 
Papadopoulos’  explicit  token  store  (ETS)  [Papadopoulos  1988].^  The  basic  idea,  used  on 
Monsoon,  is  that  each  cycle,  a  token  is  removed  from  the  queue.  Its  context  value,  c,  is  added 
to  the  destination  instruction  offset,  s,  and  that  location  is  checked.  If  the  location  is  empty, 
the  value,  o,  is  stored  there.  If  the  location  is  not  empty,  the  value  stored  there  must  be  the 
other  argument,  so  the  instruction  is  executed.  It  is  not  obvious  why  this  method  is  better 
for  the  J-Machine,  but  empirical  results  suggest  it  is. 

Last  year,  as  a  UROP,  I  designed  a  method  to  use  the  explicit  token  store  on  the  MDP 
[Spertus  1989].  Its  only  similarity  to  my  new  system  is  that  the  message  words  ane: 

1.  Instruction  address,  s. 

2.  Context  value,  c. 

3.  Data  value,  v. 

Figures  5-1  and  5-2  show  code  for  the  -1  and  multiply  nodes,  respectively,  from  the  dataflow 
graph  in  Figure  4-2  provide  examples  of  monadic  and  dyadic  nodes.  The  cfuture  fault  handler 
is  only  two  lines  long  and  is  shown  in  Figure  5-3.  Bill  Dally  played  a  major  role  in  developing 
these  templates.  For  further  details,  such  as  the  calling  convention,  see  [Spertus  1989]. 

The  only  benchmark  for  this  system  is  factorial.  It  took  431  ticks  to  compute  4!,  compared 
to  the  new  system's  1263.  The  comparison  is  fair  even  though  it  is  between  hamd-compiled 
and  machine-generated  code,  as  transforming  dataflow  nodes  is  straightforward.  Actually, 
the  comparison  is  unfair  in  the  other  direction,  because  so  much  intelligent  effort  has  been 
put  into  the  hybrid  system.  If  I  had  spent  the  past  year  studying  how  to  improve  the  ETS 
code,  such  as  by  discovering  how  to  combine  a  few  instructions  with  known  orderings  into  a 
single  macro-dataflow  node,  this  technique  would  surely  surpass  the  performance  of  the  hybrid 
system,  especially  because  it  is  already  better. 

‘While  lannucci’s  method  uses  an  explicit  token  store  also,  the  schema  I  am  presenting  is  more  trivially 
based  on  Papadopoulos’  ideas. 
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;  Subtract  1  node 
lactl.nodeS: 

move  [2, AS] ,  R1 

sub  Rl,  1,  R1 

DC  HSG: (factl_node7_left<<6ys_len_bits)+3 

8end2  3,  RO,  0 

send2e  [1.13],  Rl.  0 

suspend 


Figure  5-1:  A  Monadic  Node  Using  ETS.  The  data  value  is  taken  £roin  offset  two  in  the 
message,  and  the  constant  1  is  subtracted.  The  result  is  sent  to  the  left  input  of  node  7  on 
processor  3. 


:  Kultiply  node 
lactl.nodeS: 

move  [1,13],  RO  ;  Put  data.addr  in  12 

move  RO,  12 

move  [2 , 13] ,  Rl  ;  Get  new  argument 

:  This  line  may  fault 
mul  Rl.  [0,  12],  Rl 

DC  HSG: (lactl_node9_right<<sys_len_bits)+3 

send2  6,  RO,  0 
send2e  [1,13],  Rl.  0 

;  Clean  up 

stag  RO,  CFUT,  RO 
move  RO ,  [0 , 12] 
suspend 


Figure  5-2:  A  Dyadic  Node  Using  ETS.  If  the  other  argument  has  not  arrived  yet,  a  fault 
occurs  instead  of  a  multiply.  The  fault  handler  will  write  the  new  argument  to  the  faulted 
slot.  If  the  other  argument  is  already  there,  the  multiply  precedes,  and  a  token  is  sent  to  node 
9,  after  which  the  slot  must  be  emptied  if  the  frame  is  to  be  reused. 
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;  cfutur*  handler 
fault. clut.loc: 

move  Rl,  [0,12] 
suspend 

Figure  5-3:  The  Cfuture  Handler  for  ETS.  It  simply  moves  the  new  argument,  guaranteed  to 
be  in  Rl,  into  the  slot  reserved  for  the  argument. 

5.5  Conclusion 

Even  though  I  do  not  think  the  system  good  enough  to  justify  continuing  dataflow  research 
on  the  J-Machine  by  building  on  it,  I  consider  the  experiment  with  the  hybrid  architectme  to 
be  a  success.  In  addition  to  the  successful  results  described  in  the  analysis,  there  were  several 
other  successful  aspects  to  the  project: 

•  By  working  with  both  the  Computation  Structures  Group  and  the  Concurrent  VLSI 
Architecture  group,  I  was  able  to  help  cross-fertilize  two  groups  that  have  very  different 
outlooks  on  the  same  problem,  parallel  computation.  MIT  has  been  criticized  for  not 
having  enough  communication  between  groups. 

•  By  stretching  on  the  J-Machine  in  ways  its  designers  never  imagined,  I  have  found  some 
of  its  limits.  While  this  does  not  mean  the  J-Machine  is  flawed  or  necessarily  should 
be  changed,  its  architects  should  keep  aware  of  what  trade-offs  they  have  made  and 
reconsider  them. 

•  In  the  process  of  building  my  compiler,  I  have  built  utilities  that  will  convert  among 
different  formats  of  MDP  code.  This  should  add  other  J-Machine  programmers  in  future 
work. 

•  There  have  still  been  few  enough  MDP  coders  that  I  have  significemtly  increased  the 
number  of  hours  spent  MDP  hacking.  I  have  helped  contribute  to  the  set  of  known  neat 
hacks  for  the  J-Machine  (such  as  with  the  code  in  Figure  3-6). 

•  By  proving  the  feasibility  of  parallelizing  iterations  of  a  loop  and  presenting  ideas  on 
how  “straight-line”  Id  code  could  be  better  converted,  I  have  made  a  powerful  case  for 
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the  possible  efficiency  of  dataflow  program  execution  on  the  J-Machine. 

In  conclusion,  I  consider  the  project  very  worthwhile  and  am  optimistic  about  the  results  of 
further  reseairch  on  dataflow  computation  on  the  J-Machine. 
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Appendix  A 


MDP  Program  Examples 


A.l  MDP  Code  for  Factorial 


aodnl*  FACT 


;((:L1BSL  (;LmUL  (tSYNBOl.  ;Sq-l)))) 

SQ.l! 

KOVB  Cl,i3],  R3 
KaVB  K3,  A3 

i((:K0VB  (tUTBRAI.  (:C0DB-BL0CK  sFACT))  (tFUME  (:BASB  13)))) 

HOVB  13,  B1 

CALL  LOOKUP.VECTOa 

DC  {FACT.cod«bloek.zal> 

RQTB  to.  Cl3,A3] 

;((:COITIIUB-TBST  (:FtAlfE  (;BASB3)  :SUSPEISIVE)  (:LITERAL  (tSYRBOL  :Sq-3)))) 
DC  {Sq,ajU5.r«f> 

HOTB  3,  11 
CALL  CITT.VBCTOt 

;((:COITIIUB-TEST  (:FtAKB  (:BASBO)  ;SUSPEISIVE)  (;LlTEtAL  (:SY1(B0L  :Sq-ll)))) 
DC  ■tSq.lljM*.r*f> 

ROVE  0.  tl 
CALL  CETT.VBCTOB 

;((:UBEL  (-.LirERAL  (:SrHBaL  .-SBID-tESULT-O}))) 

SBID.tESULT.O: 

ROVE  Cl, A3].  B3 
FOVE  13,  A3 

;((:RaVE-tEHOTE  (:FFaKB  (:BASB  0)  :SUSPEISIVE) 
i  (: LITERAL  (:IITEaEt  1)} 

;  (:FRAHB  (:BA3E  6)  : SUSPEISZVE) ) > 

SUSPEBSIVB4ee3: 

ROVE  Cl. A3],  R3 
ROTE  13,  A3 

DC  <SUSPEISIVE4ea3_ug_r*l} 
mo  Co, A3].  R3 
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ITiO  [6.13],  as 
SnDO  [0.13] 

BC  {10ClL.JIOVB_Mf_r*f> 

snso  ao 

SEIDO  [0.13] 

SBIDO  1 
SEIDEO  [6.13] 


;((;MOVE-aEHOTE  (:FaiI(E  (:B1SE  0))  (aiTEBlL  (:IITBGEao)) 

:  (;FaillB  (:B1SE  6)  :SUSPEISIVE))) 

SQSPBESIVB4689: 

ROVE  [1.13].  as 
ROVE  as.  13 

DC  •[SUSPBISIVE4689^g_r«f> 

arie  [6.13].  as 

SEIOO  [0.13] 

DC  {LOClL_HOVa^g.r*f> 

SEEDO  ao 
SBIDO  [0.13] 

SEIDO  0 
SEIDEO  [E.13] 

:((:TEBMII1TE)) 

SUSPEID 

i((:LlBEL  (:LITEaiL  (:SyMBaL  :Sq-ll)))) 

sq.ii: 

ROVE  [1.13].  as 
ROVE  as.  13 

:(<:TBST--3  (:FaiKB  (:B1SB0)  :SUSPEISIVE)  (tPaiME  (:B1SE7)  :SUSPEISIVE) 
i  (:FaiKE  (:B1SB  6)))) 

3USPBISIVE4696: 

ROVE  [1.13].  as 
ROVE  as.  13 

DC  <SUSPEISIVE4696jB«g.r«<> 

iTie  [0.13].  as 

ITIG  [7.13].  as 
ROVE  6,  ai 
CILL  LOOEUP. VECTOR 
ROVE  tn*.  as 
ROVE  as.  [6.13] 

;((:TEBI(II1TE)) 

SUSPEID 

;(<:L1BEL  (:LITEB1L  (:SnCBOL  :Sq-3})}) 

Sq.3: 

ROVE  [1.13].  as 
ROVE  as.  13 

;((:<•  (:FaiMB  (:B1SB  3)  .-SUSPBISIVE)  (rLlTEElL  (:IlTBGEa  1)) 

;  (:FaiHB  (:B1SB  4}))) 

SUSPBBS1VE4703 : 

ROVE  [1.13].  as 
ROVE  as.  13 

DC  {SUSPEISIVE4703_Mg_r«T} 

aiio  [3.13],  as 

HOVE  4.  ai 
CUi  LOOKUP, VECTOR 
ROVE  [3.13],  as 
LB  as,  1,  a3 
HOVE  a3.  [4,13] 
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:((:BaiICH-riLSB  (:FRXNE  (;B1SB  4))  (iLITERiL  (iSYMBOL  :ELSB-4)})) 
HOVB  i;4,i3].  B3 
BT  B3,  3 

DC  {KLSB.4.ip.p«f> 

HOVB  BO,  IP 


;((:MOVE-IDEITm  (:PB1HE  (:B1SE3)  :SUSPEISIVE)  (;PUJ(S  (fBiSEB)))) 
SUSPEISIVE4710: 

HOVB  [1,13],  &3 
HOVB  R3.  A3 

DC  {saSPBISIVB4710jus_r«f> 

BTAO  [3. A3],  B3 
HOVB  6,  B1 
CAU,  LOOKUP.  VECTOB 
HOVB  C3,A3].  B3 
HOVB  B3,  [6, A3] 

;((:HOVB-IDEmTY  (:FBAHE  (:BAS8  4))  (:FBAHE  (:BASE7)))) 

HOVB  7.  B1 
CALL  LOQKUP.VECTOR 
HOVB  [4. A3],  B3 
HOVE  B3,  [7, A3] 

;((:BBAKCH  (:LITERAL  (.-STHBOL  :BID-lF-4)))) 

DC  -{BID.IF.A.ip.r^i} 

HOVB  RO.  IP 

j((:LA»EL  (:LITERAL  (jSrHaOL  :ELSE-4)))) 

KLSB.4: 

HOVB  Cl, A3],  R3 
HOVB  R3,  A3 

:(<!-  (:FRAHE  (:BASE  3)  :SUSPEISIVE)  (:LITERAL  (iIITEGER  1)) 
i  (iFRAKE  <:BASE  14/))) 

SUSPEISIVE4718: 

HOVB  Cl, A3],  R3 
HOVB  R3,  A3 

DC  •CSUSPBISIVE4718.Mg.rrf} 

RTAG  C3,A3],  R3 
HOVE  14,  R1 
CALL  LOOKUP. VECTOR 
HOVE  C3,A3],  R3 
SUB  E3,  1,  il3 
HOVb  R3,  C14,A3] 

■,((:canims  (.LITSIUL  (.-STHBOL  #:SQ4«74)))) 

HOVB  IBR,  R3 
SBIDO  R3 

DC  •CSq4a74j»«g.rrf) 

SBIDO  RO 
SEIOBO  A3 

;((;GET-COITEXT  (:FRAKE  (:BASB  13))  (-.FRAHE  (:BASE  10)))) 

HOVE  A3,  R3 

VTAG  R3,  IIT,  R3 

LSB  R3,  6,  R3 

HOVB  HR,  R3 

ADD  R3,  R3,  R3 

VTAG  R3,  FD,  R3 

■•ndO  Cl4,A3]  ;  •zgnaast 

DC  •aOCAL.GBTC.Mg.rbf} 

SEIDO  RO 
SEIDO  Cl 3, A3] 
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SKIDO  K3 
SBIDBO  10 

;((:SPECIlL-TBST-i  (:F&AME  (:BiSB  10)))) 

SUSPSISIVB4739: 

HOVB  [1,13],  R3 
HOVE  33.  AS 

DC  {SUSPEISIVE47S9_Bag_r«t> 

ETA6  [10. AS],  33 

;((:IIDEZ-CUB3SIT-C0ITEXT  (:LITE3A1.  <:BASE  B))  (:3ESISTE3  0))) 

HOVB  AS.  33 
VTAG  33.  IIT,  33 
HOVE  30.  33 
DC  8193 

ADO  33.  30.  33 
LSE  33,  6.  33 
HOVB  113,  31 
ADD  33.  31,  33 
VTAG  33.  FO,  33 
HOVB  33.  [O.AO] 

;((:HOVB-BEHOTB  (:F3AKE  (:BASE  10))  (;LITE3AL  (:IITEGE3  0)) 

;  (:3EGISTB3  0))) 

SBIDO  [10. AS] 

DC  {LOCAL.HOVB.msg.raT} 

SBIDO  30 
SBIDO  [10. AS] 

SBIDO  0 
SEIDBO  [O.AO] 

i((:HOVB-BEMOTB  <:F3A11E  (:BASB  10))  (aiTERAL  (:IITKGE3  3)) 

;  (:F3AKB  (;BASE  14)))) 

SEIDO  [10, AS] 

DC  •a0CAL_H0V3_»«g.r«T> 

SBIDO  30 
SEIDO  [10, AS] 

SBIDO  3 
SEIOEO  [14, AS] 

;((:TE3HIIATE)) 

SUSPEID 

;((:I.ABEL  (:LZTE3AL  (.-SYHBOL  «.Sg4e74)))) 

804874: 

HOVB  [1,A3],  33 
HOVE  33.  AS 

;((:COITIIUB-TBST  (;F3AHE  (.-BASE  3)  .-SUSPEISIVE)  (:L1TE3AL  (:SYHBOL  :Sq-E)))) 
DC  <Sq.S_ug_T«T> 

HOVE  3,  31 
CALL  CITT.VECT03 

;((:COITIIUB-TEST  (:F3AHE  (:BASB8)  :SUSPEISIVE)  (iLITERAL  (:SYKBOL  ;Sq-8)))) 
DC  •[Sq_8_aag.T«T> 

HOVB  8,  31 
CALL  CITT.VECT03 

;{(:LABEL  (: LITERAL  (: SYHBOL  : EID-IF-4) ) )) 

EID_IF_4: 

HOVB  [1,A3],  33 
HOVB  33,  AS 

;((:TERHIIATE)) 
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SUSPBID 


:((:L1BXL  (:LZTBX1L  (iSYXBOL  iSQ-S)))) 

Sq_8: 

HOVE  Cl.i3].  t3 
ROVB  13.  i3 

:(<:TBST-1  (:PRAM8  (-.BASS  8)  rSUSPBISIVB)  (:B£GISTBS  0))) 

SUSPBISIVE4741 : 

HOVB  Cl.i3],  K3 
HOVB  B3,  13 

DC  {SDSPBBSIVB4741_)ug.r««} 

BTIS  [8,13],  B3 
MOVE  «rn«,  B3 
HOVB  13,  [0,10] 

:((:>STUU-COrrBZT  (:PaiHB  (:B1SB  10)  :3USPBISIVK)  (:FB1HB  (:B1SB  11)))) 
SUSPBISIVE474e: 

HOVB  [1,13],  B3 
HOVB  B3,  13 

DC  {SUSPEBSIVB4748_»ag.z«<> 

BTIO  [10,13],  B3 
HOVB  11,  B1 
CIU.  LOOKUP.VECTOR 
HOVB  tnia,  13 
HOVB  13,  [11,13] 

:((;HOn-ZBtmTI  (.-PUKB  (.-BISB  H))  (:F11HE  (:B1SB7)))) 

HOVB  7,  11 
CILL  LOOKUP. VECTOl 
HOVB  [11,13],  13 
HOVB  13,  [7,13] 

;((;TBBHII1TE)) 

SUSPEID 

;((:L1BEL  (sLITBUL  (;SirHBOL  ;Sg-S)))) 

Sq.S: 

HOVB  [1,13],  13 
HOVB  13,  13 

:((:•  (:F11HB  (:B1SB3)  :SUSPEISIVB)  (:F11HE  (iBlSKB)  tSUSPEISIVE) 
i  (:F11HB  (:B1SB  13);)) 

3USPBISIV847E3: 

HOVB  [1,13],  13 
HOVB  13,  13 

DC  {SUSPEISIVE4763.Mg_r*f> 

1710  [3,13],  13 
1710  [9,13],  13 
HOVB  13,  11 
CILL  LOOKUP. VBC701 
HOVB  [3,13],  13 
HUL  13,  [9,13],  11 
HOVB  11,  [13,13] 

;((:H0VB-IDEm7Y  (:F11MB  (;B1SE  13))  (;F11KE  (;BlSBe)))) 

HOVB  8,  11 
CILL  LOOKUP.VECTOl 
HOVB  [13,13],  13 
HOVB  13,  [8,13] 

;((:7B1HII17E)) 

SUSPEID 

•nd 
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p«f  SUSPBIS1VB47B3  ■  MSG:  (((SUSPBISIVB47B3+F4CT_loe)«10))'f2 

p»f  SUSPUSIVS4746_ug_r*f  »  HSG:  (((SUSPEISIVE474«+FiCT_loc)«10))+2 
r«f  SUSPBISIVE4741_**g_P*f  ■  MSG:  (((3USPKISIVE4741+FACT_loc)«10))+2 
rrf  Sq_8_wg.r««  -  MSG:  (((Sq_8+FiCT.loc)«10))*2 
r»f  Sq_G_Mg.r«f  «  MSG:(((Sq.B*FlCT_loe)«10))*2 

p«f  SUSPEISIVS4729jo«g_r«f  »  MSG:  (((SUSPEISIVB4729+FiCT_loc)«10))+2 
x»i  Sq4674_Mg_z»7  «  MSG:(((Sq4674+FlCT.loe)«10))+2 
r«f  SUSPBISIVE4718_»«g_r«f  »  MSG:  (((SUSPEISIVE4718+FiCT.loc)«10))+2 
S0SPElSlVB4710^g_r»f  -  MSG:  (((SUSPBHSlVE4710+FACT_loc)«10)>+2 
r«f  SUSPBlSIVB4702_Mg_r«f  »  MSG:  (((SUSPEISlVB4702+FACT.loe)«10))*2 
xti  Si;SPBISIVB489S_wlg_P«f  «  MSG:  { ({SUSPEISIVB469S*FACT_loc)«10))+2 
SUSPBISI?K4889j»g.p«f  »  MSG:  <((SUSPBISIVB4889*FACT.loc)«10))^2 
SUSPEISIVE4883_»*g_r«f  -  MSG;(((SUSPBISIVB4«83+FACT_loc)«10))+2 
rrf  Sq_ll^g.r*l  ■  MS6:(({Sq_114FACT_loe)«10))+2 
r«<  Sq.2jug_r«f  -  MSG:  <«Sq_2+FACT.loc)«10))>2 
prf  BID.IF,4.ip_p«f  -  IP:(((BID_IF.4+FACT.loe)«10)>+ABSOUJTB 
r«<  BLSB_4.ip.P«f  ■  IP:  <((KLSB.4+FACT_loe)«10))+ABS0LHT8 
p«f  FACr_eod*bleek_r«f  -  CB:  (FACT.loe«ie)+lB 
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A. 2  MDP  Code  for  Fibonacci 


aodnl*  FIB 


;((:UBSI.  (:LITKB1L  (:SYIIBOL 

sq.i: 

HOTI  B3 

lOVB  K3,  la 

:((:HOVK  (aiTBBlL  (:CODB-BLaCZ  :FIB})  (iFim  (:BASB  18)))) 

HOVB  18.  K1 

cm-  LOOEnP.VBCTOK 

DC  '(FIB.eod«bleek.r«f> 

MOVK  to,  [18,13] 

:((:C0miUB-TK3T  (:FtlIS  (:B1S8  3)  sSUSPEISIVS)  (:UTEtlL  (tSYMBOL  :sq-a))}) 

DC  •Csq.a.Mg.r.f} 

HOVE  3,  tl 
CALL  CITT.VBCTOt 

j((:COmiUB-TEST  (:FKAXE  (;BASBO)  :SUSPEIS1VE)  (:LITERAL  (jSYMBQL  !Sq-17)))) 

DC  {Sq.l7jMg.r«f> 

HOVE  0,  tl 
CALL  CITT.VBCTOt 

;((:LABEL  (.-LITBtAL  (.-SYMBOL  ;SBfD-t£SULT-0)))) 

SEID.tBSULT.O: 

MOVE  [1,13],  t3 
MOVE  t3,  A3 

;((;MOVB-tEMOTB  (:FIIAME  (:BASB  0)  :SUSPEISIVB) 

;  (:LITEKAL  (aiTEGEH  D) 

:  (iFtAME  (:BASB  «)  iSUSPBISIVE))) 

SUSPEISIVE3S03: 

MOVE  [1,A3],  t3 
MOVE  t3,  13 

DC  'CSUSPEISIVSaSOS.Bsg.raO 
tTA6  [0,13],  t3 
tTAS  [8,13],  t3 
SEIDO  [0,13] 

DC  -aOClL.HOVB.ug.raO 
SEIDO  to 
SEEDO  [0,13] 

SEIDO  1 
SEIDEO  [6,13] 

;((:MOVE-IBMOTE  (.-FtlME  (.-BASE  0))  (.-LITEtlL  (:IITECEt  0))  (:FIUME  (:E1SE  6)  : SUSPEISIVE) } > 
SDSPEISIVE3E09: 

MOVE  [1,13],  t3 
MOVE  13,  13 

DC  {SUSPBISIVBaS09.MS.T<f} 
tTia  [6,13],  t3 
SEIDO  [0,13] 

DC  {LOClL.MOVK.ug.raf] 

SEIDO  to 
SEIDO  [0,13] 

SEIDO  0 
SEIDEO  [6,12] 

;((:TIIMII1TE)) 


76 


SUSPBID 


;((:L1BQ.  (:LITERAL  (; SYMBOL  :Sq-17)))} 

SQ.IT: 

MOVB  Cl.iS].  B3 
MOVB  t3,  i3 

;(f:T8ST-5  (:9iSB  '')  -SOSPEISIVB)  (; FRAME  (:BASB  7)  :SUSPEISIVK)  (: FRAME  (:BASE  6)))) 

SUSPBISIVB3E16: 

MOVB  C1.A3],  R3 
MOVB  R3,  A3 

DC  {SUSPElSIVB3S16^S_r«<> 

BTAO  (0,A3],  R3 
RTAO  [7. A3],  R3 
MOVB  S,  R1 
CALL  LOOEUP.VBCTOR 
MOVB  tna,  R3 
MOVB  R3,  [S,A3] 

;((:TBRMIIATB)) 

SUSPBID 

;((:LABEL  (:LITERAL  (: SYMBOL  :Sq-3)))) 

Sq.3: 

MOVE  [1,A3].  R3 
MOVB  RS,  A3 

;<(:<-  (:FRAMB  (.•BASS  3)  ; SUSPEISIVE)  (:LITERAL  (:IITEGBR  1))  C:FRAME  (:BASE  4)))) 
SUSPEISIVB3E33; 

MOVE  C1.A3],  R3 
MOVB  R3.  A3 

DC  •CSUSPBISIVB3E33^g_r«f> 

RTAO  [3, A3].  R3 
MOVB  4,  B1 
CALL  LOOEUP.VBCTOR 
MOVB  C3,A3],  R3 
LB  R3,  1,  R3 
MOVB  R3,  C4.A3] 

;((:BRAICH-FALSB  (:PRAMB  (:BASB  4)}  (iLITERAL  (.-SYMBOL  ;ELSB-4)))> 

MOVE  [4, A3],  R3 
BT  R3.  3 

DC  {BLSB_4.ip.r«f} 

MOVB  RO,  IP 


;((:MOVB>IDEITITY  (:FRAMB  (:BASB3)  :SUSPEISIVB)  (:FRAME  (:BASEe)))) 
SUSPBISIVB3E30: 

MOVB  Cl. A3],  R3 
MOVB  R3.  A3 

DC  {SUSPBISIVE3G30^g_r«Y} 

RTAO  [3. A3],  R3 
MOVB  a,  R1 
CALL  LOOEUP.VBCTOR 
MOVB  [3, A3],  R3 
MOVE  R3,  [6,13] 

:((:B0VB-1DEITITT  (:FRAME  (:BASE  4))  (:FR1ME  {:B1SE7)))) 

MOVB  7,  R1 
CALL  LOOEUP.VBCTOR 
MOVB  [4, A3].  R3 
HOVE  R3,  [7, A3] 

;((;BRAICB  (:LITBRAL  (-.SYMBOL  :EID'IF-4) ) ) ) 
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DC  {KI0.IF.4.ip.r*f> 
HOVX  to.  IP 


:((:UiBS.  (:LITBUL  (:STXBQL  :ELSE-4)))) 

ILSK.4: 

HOVE  E3 

ROVE  113,  13 

(:FE11CB  (:B1SB  3)  :SUSPBIS1VB)  (:LITBiaL  (sIlTEGEE  D)  (:FE1XE  (:B1SE  30)))) 
SnSFEI5IVE3B3S: 

HOVE  [1.13],  E3 
HOVE  E3.  13 

DC  {SUSPEISZVE3S38^g_r«f> 

ETia  [3,13],  13 
ROTE  30,  El 
CILL  LOQEaP.VECTOB 
ROVE  [3,13],  E3 
SOB  E3.  1.  E3 
ROVE  13,  [30,13] 

(:FE1HE  (sBlSE  3)  -.SttSPBISIVE)  (;LITBR1L  (:1ITBGEE  3))  (:FE1HS  (:B1SB  19)))) 
SUSPBISIVB3E44: 

ROVE  [1,13],  E3 
ROVE  E3.  13 

DC  'C3USPEISIVB3E44jug_r«f> 

ET19  [3.13],  E3 
ROVE  19,  El 
CILL  LOOKUP.VECTOR 
ROVE  [3,13],  E3 
SUB  E3,  3.  E3 
HOVE  E3.  [19,13] 

;((:CamiUE  (iLITEElL  (iSYRSOL  «;SQ3490)))} 

ROVE  lEE.  E3 
SSMDO  E3 

DC  {Sq3490_ug.r«f> 

SBIDO  EO 
SBIDBO  13 

:((:«ET-CQBTEZT  (:PE1RB  (:B1SE  18))  (:FUHR  (:B1SB  13)))) 

ROVE  13,  E3 
VTIO  E3,  IBT,  E3 
LSH  E3,  e,  E3 
ROVE  IIE,  E3 
IDD  E3,  E3.  E3 
VTIO  E3.  FD,  E3 
;  aak«  ap  dastlaatlon 
■0T«  [19,13] .El 
or  El,  E3,  EO 
■ad  El,  E3,  B3 
add  EO,  E3,  El 
■OT*  31,  EO 
and  El,  EO,  El 
saadO  El 
;  saadO  1 

DC  {LOClL_GBTC_BSg_raf> 

SBIDO  EO 
SBIDO  [18,13] 

SBIDO  E3 
SEIDBO  13 

;((:SPECtlL-TEST-l  <;FE1HB  <:B1SB  13)))) 

SUSPBISIVE3SES : 

ROVE  [1,13],  E3 
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MOVE  13.  ia 

DC  'CSUSPnSIVK3GSS^g_ra«> 

tria  [13.13],  u 

;((:IISSZ'CUUEIT-COITBZT  (:LITB&1L  (:B1SB  10))  (tKEGISTEK  0))) 

HOVB  13,  13 
VTIG  13,  IIT,  13 
HOVB  10.  13 
DC  10340 
lOD  13.  10.  13 
LSI  13,  6.  13 
HOVB  III,  11 
lOD  13.  11.  13 
VTIO  13.  PD.  13 
HOVB  13.  [0.10] 

;((:K0TB-1BH0TB  (:P11MB  (:B13B  13))  (:LITB11L  (tllTBGBl  0))  (tlBGISTEl  0))) 

SBIOO  [13,13] 

DC  [LOClL.HOVl.MS.raO 
SBIDO  10 
SBIDO  [13,13] 

SBIDO  0 
SBIDBO  [0,10] 

;((:BOVB-BEMOTB  (:F11HB  (:B1SE  13))  (iLITEElL  (:I1IEGE13))  (rPllME  (:B1SE  19)))) 
SBIDO  [13,13] 

DC  ■aoClL.KQVl.ug.raf} 

SBIDO  10 
SBIDO  [13.13] 

SBIDO  3 
SBIDBO  [19,13] 

;((:TB1KII1TB)) 

SUSPBID 

;((:L1BEL  (iLITEllL  (:SYMBOL  •:Sqa490)))) 

Sq3490: 

HOVB  [1.13],  13 
HOVB  13,  13 

i((:COITIIUB*TBST  (iPllHB  (jBISB  10)  iSOSPBISIVB)  (.'LITEllL  (.-SYHBOL  .SO-B)))) 

DC  {Sq_8_B»j.raf> 

HOVB  10,  11 
CllX  CITT.VBCTOl 

;((:COITIIUB  (:LITB11L  (: SYHBOL  «:Sq3494)))) 

HOVB  III,  13 
SBIDO  13 

DC  {Sqa494^g_r«f> 

SBIDO  10 
SBIDBO  l4 

;((:OBT-COITKXT  (:P11HB  (:B1SB  18))  (:FR1HE  <:B1SB  16)))) 

HOVB  13,  13 
VT16  13,  IIT,  13 
LSI  13,  e,  13 
HOVB  III,  13 
IDD  13,  13.  13 
VTIG  13,  FD,  13 
;  aaka  np  daatinatioB 
mor*  [30,13] .11 
or  11,  13.  10 
and  11.  13.  13 
odd  10.  13,  11 
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aoT*  31.  10 
and  II,  10,  II 
■•adO  II 

;  SnDO  1 

DC  O.OClL.GSTC^K.raO 
SnOO  10 
SBIDO  [18,13] 

SEIDO  13 
SnOKO  IB 

;((:SPBCIU.-TBST-1  (:ruiIB  (:BiSB  IE)))) 

S0SPKISIVZ3670: 

HO?B  [1,13],  13 
ROn  IS,  13 

DC  {SUSPBISIVB3S70juK.t«f> 

ITia  [IS, 13],  13 

.•((:IIOSX-COIIBIT-C01TBXT  (.-LITEIIL  (:B1SB  13))  (:IEGISTBI  0))) 
lOTB  13.  13 
VTia  13.  ZIT,  13 
HOVB  10,  II 
DC  13313 
IDO  13.  10,  13 
LSB  13,  8.  13 
HOVB  BIB,  10 
IDO  13.  10.  13 
VTia  13,  PO.  13 
HOVB  13,  [0,10] 

;((;KQVB-BEHOTB  (:PI1MB  (:B1SB  16))  (iLITEUL  (:IITEGEI  0))  (:IEGISTER  0))) 

SEIDO  [16,13] 

DC  {LOClL.HQVI.BSg.raf] 

SBfOO  10 
SEIDO  [16,13] 

SBIDO  0 
SBIDBO  [0,10] 

:((:H0VB-IEHQTB  (sFUHE  (:B1SB  16))  (sLITEUL  (;IITEGEI  3))  (:FB1HE  (:B1SE  30)))) 
SBIDO  [16,13] 

DC  {LOClL.HOVI.BSg.raf} 

SEIDO  10 
SBIDO  [16,13] 

SBIDO  3 
SBIDBO  [30,13] 

:((:TBBKII1TI)) 

SOSPBID 

:((:L1BEL  (:LITEI1L  (iSTHBOL  «:Sq3494)))) 

Sq3494; 

HOVB  [1,13],  13 
HOVB  13,  13 

;((:CamiUE-TBST  (:FB1KB  (:B15B  14}  .-SUSPBISIVE)  (aiTERlI.  (;SYnBOL  ;Sq-13)))) 

DC  '{Sq.l3_BSg_r«f} 

HOVE  14,  II 
CILL  CBTT.VECTOI 

i ((;COITIIUI-TEST  (:FI1KE  (:B1SB  13)  ;SUSPEISIVB)  (iLITEBlL  (iSYHBOI.  :Sq'13)))) 

DC  {Sq.l3jug_r«f> 

HOVB  13,  II 
CILL  CBTT.VECTOI 

:((:COITIIUB-TBST  (:FI1KE  (:B1SB  8)  :SUSPEISIVE)  (:LITER1L  (:SYHBQL  :Sq-14)))) 
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DC  <Sq_t4_Mg_r«f> 

HOVI  8.  tl 

CALL  CITT.VBCTOE 


;((:L1BS.  (:LITEKiL  (rSYXBOI.  :EID-IF-4)))} 

BID.IF.4: 

HOVX  [1.13].  R3 
HOVB  B3.  13 

;((:1tBBHlllTB)) 

SUSPBID 

;((:L1BKL  (:LITXUL  (;SYHBOL  ;3q-14))}) 

Sq.l4: 

HOT!  [1.13],  B3 
mn  B3.  13 

i((:TBST-3  (iFKlHB  (:B1SB  8)  :3USPBISIVB)  (:F&1KX  (:B13B  9)  :SUSPEISIVB}  (:FKDIB  (:B1SE  16)))) 
303PBI3IVE3683: 

■OVX  [1,13].  13 
HOVB  13.  13 

DC  'CSUSPBISiyE3S83_M{_r«f} 

ITia  [8,13],  13 
ITlfi  [9,13],  13 
HOVE  18,  11 
CILL  LOOEtlP.VECTOl 
HOVB  tn«.  13 
HOVB  13.  [18,13] 

;((:HQVB-IDBITm  (iFllHB  (:B1SE  18))  (sFllKE  (:B1SE  7)))) 

HOVB  7.  11 
CILL  LOOIUP. VECTOR 
HOVB  [18,13],  13 
HOVB  13,  [7,13] 

;((:TBRXII1TB)) 

SUSPBID 

i((:LlBEL  (:LITE11L  (:SYHBOL  :Sq-13)))) 

Sq.l3! 

HOVB  [1,13],  13 
HOVB  13,  13 

:((:TBST-1  (;F11XB  (:B1SB  13)  :SUSPEISIVE)  (:1ECISTER  0))) 

SUSPBISIVB3S90: 

HOVB  [1,13],  13 
HOVB  13,  13 

DC  '{SUSPBI3IVB3E90^8_r«<> 

ITIO  [13,13],  13 
HOVB  tn«.  13 
HOVB  13.  [0,10] 

;((:1BTU1I-C0ITEXT  (:F11HE  (:B1SB  16)  :SUSPEISIVB)  (:FUXB  (:B1SE  8)))) 

SUSPBISIVB3696: 

HOVB  [1,13],  13 
HOVB  13.  13 

DC  {SUSPEISIVE369S_a<g.r«f> 

1T16  [16,13],  13 
HOVB  8.  11 
CILL  LOOIUP.VECTOR 
HOVI  trv.,  13 
HOVB  13.  [8,13] 

;((:TB1XII1TB)) 
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SUSPSIO 


:((:L1B&  (:LITSUL  (tSnfBOL  :Sq-13)))) 

Sq_13; 

HOTS  [1.13].  S3 
HOVE  S3.  13 

;((:♦  (:PB1HE  (:B1SB  14}  .-SUSPEISIVS)  (;FS1ME  (:BASB  11)  :SUSPEISIVE)  (iFRlKE  (:B1SE  17)))} 
3QSPEISIVS3601: 

HOTS  [1.13].  S3 
HOTS  S3.  13 

DC  {S0SP8ISIVS3e01jue_r«f> 

STia  [14.13].  S3 
STIA  [11.13].  S3 
HOTS  17.  SI 
CILL  LOOXUP.VBCTOS 
HOTS  [14.13].  S3 
lOD  S3.  [11.13].  SO 
HOTS  SO.  [17.13] 

:((:HOVB-IOBIT1TT  (:FS1HB  (:B1SB  17))  (:FS1ME  (:B1SB  6)))) 

HOTS  6.  SI 
CILL  LOOlUP.VECTOB 
HOTS  [17.13].  S3 
HOTS  S3.  [6.13] 

:<(:TEKHII1TB)} 

SUSPBBD 


;((;L1BEL  (iLITBSlL  (:SyHBQL  :Sq-8)))) 

Sq.6: 

HOTS  [1.13].  S3 
HOTS  S3.  13 

i<(:TBST-l  (inUKS  (iBiSS  10)  : SUSPEISIVS)  (.-BBSISIER  0))) 
3USPEBSIVE3410: 

ROTS  [1,13].  B3 
ROTS  S3.  13 

DC  {3U3PEBSIVE3610_us_r«f> 

STIO  [10,13],  S3 
HOTS  tm«,  S3 
HOTS  S3.  [0.10] 

;((:SETUSB-CaiTEXT  (:FS1HE  (:B1SB  13)  :SUSPEISIVE)  (:FS1HE  (:B1SE  9)})} 
SUSPB13ITS3ei6: 

HOTS  [1,13],  S3 
HOTS  S3,  13 

DC  '[SU3PEB3ITB3eiS_Hg_r«f> 

STia  [13,13],  S3 
HOTS  9.  SI 
CILL  LOOIUP.VBCTOS 
HOTS  «xu«,  S3 
ROTS  S3.  [9,13] 

i((:TBSHIBlTB)) 

SOSPBID 

•ad 

zaf  3USPBI3IVE361S^g_r«f  >  KSG:  (((SUSPEISIVE361S+FIB_lee)«10))'^3 
raf  3USPEISIVE3ei0^g_raf  >  HSG:  (  ((SUSPEISIVE3610'^FIB_loc)«10)  }-^3 
zaf  3U3PBI3IVB3601_Mg.zaf  >  MSG:  ( ((3USPEISIVE36014'FIB.loc)«10)  }-i-3 
zaf  3U3PBI3IVS3B9S^g.zaf  -  N3G:  (((SUSPE13IVE3B9SaFIB.loc)«10)  )-»^3 
zal  SUSPEISIVB3S90^g_zaf  -  MSG:  (((SUSPEISIVE3690'fFIB_loc)«10))+3 
zaf  SUSPBBSIVS3E83_«sg.zaf  •  M3G:  (((SUSPEI3IVE3S83-»FIB_loc)«10))+3 
zaf  Sq_14_Mg.zaf  ■  MSG:  (((3q.l4+FIB.loc)«10))+3 
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r*f  sq.l3^s_z»f  -  HS«:<((sq.l3+riB.loe)«10))-»a 
r«f  sq_ia^s_z*f  -  HSO;(((sq_ia-KFIB.loc)«tO))-t-2 

X9i  S1)SPBlSIVB3S70^g_z«f  -  MSG:  (((SUSPEISIVB3670+FIB_loc)«10})+3 
z«f  Sq34M_Mg.r«f  -  NS0:(((Sq34»4>PIB_lae}«10))-»a 
X9i  Sq.8jM8_r«f  -  HSG:(((sq.8+PIB.loc)«10))+3 

r*f  SOSPBISIVBasSSjMg_r«f  -  MSG:  (((SUSPEISIVE26B6+FIB_loc)«10))+2 
r*#  Sq3490j«g.r*f  -  MSG:  (((SQ2490+FIB.loc)«10))+2 
r*<  SUSPEISIVB2644_MS_r«i  -  MSG:  (((SUSPEISIVE2B44+FIB_loc)«10))+2 
r«f  SUSPEISIVE2B38^g.r«f  -  MSG:  ((<SUSPElSIVE2B38+FIB_loc)«10))+2 
r«f  S0SPEISIVB2B30^g.r«<  -  MSG:  (((SUSPEISIVE2B30+FIB_loe)«10))+2 
S0SPBISIVB2S22_wig_r«f  -  MSG:  (((SUSPEISIVE2S2a+FIB.loc)«10))+2 
X0t  S0SPgfSIVB3616^g.r««  -  MSO:  (((3VaPEMSIVS2SiSt-ni.loe)«10))*2 
r*f  S0SPBiSIV82B09^g_rrt  -  MSG;  («SUSPEISIVEaB09+FIB.l<'c''«l<5»+2 
SUSPBISIV8aS03^g.z««  -  MSG:  (<(SUSPEISIVSaS'33+FIB.Wj«10)>+2 
Xft  Sq_17^g.T««  -  MS0:{((Sq.l7^FIB_loe)«10))-»2 
z««  sq.3_ug.z«f  -  MSG:(((sq_3«FIB_lee)«10))>a 
x«f  EII>.IP_4_ip.r«*  •  lP:C<(EID_IF_4*FIB.loc)«10))+iBS0LUTB 
z«f  KLS8_4..ip_r«i  •  IP:  (((BLSK_4+FIB_loc)«10))+iBS0LUTK 
r«f  FZB_eo<l*blaek_r««  ■  CB:  (PIB.loe«18)-:'31 
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A. 3  MDP  Code  for  Loop  Example 


This  Is  s  rsszlts  oi  ths  loop  progzaa  with  aon-Isnsneci  strncturss. 
Instosd  ot  hsoiag: 

/->|  |<->|  |<->|  |< - \ 

\ - / 

uo  s  sstnp  short  dlfforoat  itorstions  Itsrstloa  poiatsrs  szo 
eoatlgooas: 


(Thiak  o<  as  ■*-!**) 


(Thlak  of  St  "K'') 


Ths  ■stssgo  fozBSt  sill  b«: 

IISC:loestioa  of  cods 

AODtifrsas  bsso 

ADDkslocatloa  of  Itorstloa  povatsr 
i3  sill  bo  losdod  sith  ths  fraas  bass. 

11  sill  bs  loadsd  sith  ths  bass  of  ths  aabfzsBS 


aezaal  fraas 


ptr  to  it.  K-1 
ptr  to  it.  0 
ptr  to  it.  1 

ptr  to  it.  K-1 
ptr  to  it.  0 


sabfraas  0 


snbfraas  1 


sabfraas  K-1 


;  Tlass  'Incladlng  proc  call  oTsrhsad) 


arg  k 

0  3 

1  3 

10  3 

A  3 

0  3 

10  S 


tias 

33S 

460 

1676 

336S136SA 

330 

1690 


A  K  336+13&*A+E*K 


labsl  LIBaAKY.PLACE>«180 

labsl  fraas.tias  •  0 
labsl  frBas_B_itsratioo_alots  *  6 
labsl  argnasat  •  10 
labsl  k  -  6 
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Ikbal  total.fraaa.sis*  «  frua.clx*  k  •  (fraa«_n.it«rBtion_ilots  1)  4'  2 
l«b«l  alotlO  •  totBl.fraaa.sis*  -  1 
label  alotE  -  2 

Inclnda  "libS.adp" 

; ;  Pxogxaa  coda 

label  C00S_PLlCEe«4OO 

aodale  progxaii_code 

:  ((:L1BEL  (:LITBUL  (:STXB0L  :Sg-l)}) 

a«uls 


mow 

[1,13],  RO 

mow 

EO.  12 

;  Alt«r«d  ord«r 

la  a  teaporaxy  klndge 

;  (:CITT 

(:TXm  (:B1SE  3) 

$ 

iSUSPBBSIVB) 

(:LITBR1L  (:SmaL  :Sq-4))) 

DC 

<aq_4_«ag.xef> 

aoT« 

3,  R1 

cmll 

CITT 

;  (;CITT 

(:FR1XB  (:B1SE  0) 

rSUSPEISIVE) 

(:LITER1L  (iSYHBQL  :Sq-3))) 

DC 

<aq.3_maE.re«> 

■ov« 

0,  R1 

eoll 

CITT 

moro 

ip.  »0 

mOT4 

RO,  [0,13] 

i  (:  LIBEL  (:  LITERAL  (:SnCB0L  :SEI0-RESULT 

aend.reaslt.O: 

mow 

[1,13],  RO 

moro 

RO,  12 

!  (:H0VR  (sFRAKE  (:B1SB  0) 

iSUSPBISIVB) 

(:LITBR1L  (:IITEGER  1}) 

(:FR111B  (:B1SB  7) 

;SUSPBISIVB)) 

DC 

{local .■oTx.aag.xaf} 

moro 

Cr.l2],  R1 

s«nd3 

[0,12],  RO,  0 

Bond 

[0,12],  0 

ootidlo 

1,  Rl,  0 

moro 

ip,  xO 

moro 

lO,  [0,13] 

moro 

[1,13],  RO 

wwro 

RO,  12 

(:H0VK  (:FIUMB  (;B1SB  0)) 

(iLlTBUL  (:IITBGER  0)) 
(;FK1MB  (;B1SE  6) 

:SUSPEI5IVE)) 
DC  {localjBoer.aai^.ref} 

■oee  [6,12],  El 
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■•Bda  [0.i3],  to.  0 

••ad  CO. A3],  0 

■•ad3«  0,  Kl,  0 


(.•mmiATB) 

■aapaad 

(:LABKL  (:LITBUL  (;  SYMBOL  :Sq-4))} 

BOT«  Cl >  A3] ,  BO 

aoT*  BO.  A3 

(:H0VB  (xFBAME  (:BASB  3) 

zSOSPBISIVB) 

(sFBAKB  (:BASK  4))) 
aav*  4,  B1 

call  LOOIUP 

aoT«  C3,A3],  B1 

BOT«  Bl,  Cd,A3] 

(:SUB  (tPBAMB  (:BASB  3)) 

(:LZTXB1L  <:»TBeBB  3}) 
(iBEOISTEB  0)) 

(:HaVB  (iLITEBAL  <:XITBSEB  9}} 
(•.BEGISTEB  1)) 

(;SaB  (:BEaiSTEa  1) 

(:LITEBAL  (;IITEGEB  0)) 
(•.BEGISTEB  3)) 

(:ADD  (:BEGISTEB  1) 

(:LITEBAL  (iIBTEGEB  6)) 

<i BEGISTEB  3)) 

(iSTPB  (: BEGISTEB  3)} 

(iSTCB  (-.BEGISTEB  D) 

(:STIX  (: BEGISTEB  3)) 


j  Pat  Bu«  ID  aaaory  into  A1 

DC  IIT:fruM_ais«<<ara.l*a-Dit> 

BOT*  A3,  B3 

•tag  B3,  IIT,  B3 

add  B3.  BO.  B3 

■tag  B3,  ADDB,  B3 

aota  B3,  A1 

;  Pat  k  into  Bl 
Bota  CalotE,A3],  Bl 

aab  Bl,  1,  Bl 

;  Pat  baa*  affaat  into  B3 
Bova  3‘^raBa_aiaa.  B3 

add  B3,  C3,A3],  B3 


:  Xa  thia  aatap,  0  throagb  k-3  gat  IH 
;  -1,  k-1,  k  do  not. 

;  Loop  tkrongh 
DC  IIT:aaakIH 

or  B3.  BO,  B3 

aora  1 .  B3 
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leep_s«tnp_loep : 

■OT*  E3.  CU.il] 

It  K3.  Kl.  to 

kdd  U,  fxaa«.ii_lt«ration_slota,  R3 

add  R3,  1.  >3 

bt  to,  *laop_a«tnp_loop 

:  Star#  k-1  alot,  ate. 


DC 

ZIT:*(aaskZ]|  1 

1  BaakPC) 

%ad 

R3,  RO.  R3 

worm 

R3.  CR3.A1] 

1 

M 

mow 

R3.  CO.il] 

mow 

Cl. 11}.  RB 

ond 

R3.  RO.  R3 

add 

R3.  1,  R3 

mow 

R3.  [R3.A1] 

!i  (;I.iBSL  (:LITSUL  (.-SYKSOL  : SBTUP-LOOP-6) ) ) 

aatap.laap.S: 

<:BtZ  (.-RBCZSTER  0) 

;;  (:LITEUL  (iSTHBOL  :EIO-SETUP-LOOP-E))) 

ii  (:ZZZS  <:LZTBBiL  (:ZITB6BB  6)} 

(iPBlMB  (:IBZT-ZTBUTZOI  0))) 

;;  (iSOB  (:RBCZSTBa  0) 

(:LZTEKAL  (sZITBCEB  1)) 

(:RBGZSTER  0)) 

(:BB  (iLZTBRlL  (tSYHBOI.  :SBrUP-U0P-6))) 

;;  (:LiBEL  (.-LZTEaiL  (iSYIIBOL  lEID-SETUP-LOOP-E))) 

aad_aatap.leap_B ; 

(iSm  (:RBGZSTBB  3)) 

:i  (:ZZZD  (:LZTEEiL  (-.ZITEGER  «)) 

::  (:BBGZSTBB  4)) 

(:1ZZD  (;LZTBE11.  (iZITBGEB  0)} 

;;  (:  mXB  (.-ZTEUTZOI  0))) 

i  (:STZX  (:FRAIIE  (s  ZTEUTZOI  0)} 

;  (iLXnUL  (:BOOLUI  .-FALSE)} 

;  (:FRA1IB  (:ZTEUTZOI  0))) 

i  la  aaa  aehaaa,  tbla  aaaaa  aat  -I’a  ta  aara.  Daaa  abara. 

::  (:ZXZO  (:LZTEUL  (:  ZITEGER  6)) 

;;  (:RBGZSTBR  4)) 

;;  (:ZZZD  (:LZTERiL  (: ZITEGER  0)) 

(:FRilfX  (iZTEUTZOI  0))) 

(:STIZ  (iREGZSTER  1)) 

i  <:STPC  (.-FRIXB  ( .-lEZT-ZTSUTZOI  0)) 

i  (:LZTER1L  (sSYHBOL  :ZTEUTE-6)) 

;  (:FR11IE  ( :IEZT-ZTEUTXOI  0))) 

DC  ZITjaaakPC 
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■AT*  1 , 

call  CUCI.im 

;  (zTnimiTB) 

•«ap«ad 

:  (:UBSL  <:LITEIUL  (iSYMBOL  :ZTBUTB-S}>> 

lt«rat«_B: 


Cl, A3],  B1 

■OT« 

Bl,  A3 

wtmg 

Bl.  ZTT,  Bl 

■0T4 

C3.A31.  B3 

add 

B3,  EraaM.aiaoAl , 

B3 

mow0 

CB3.A31,  B3  ; 

OEEaat  to  baa#  oE  onr  loop  anbEraao  non  la  B3 

Ish 

B3,  aTa.laa_blta, 

B3 

add 

Bl.  B3,  Bl 

wtmg 

Bl,  ADDB,  Bl 

aoTa 

Bl,  A1  : 

Baaa  of  ear  loop  aabEraao  in  A1 

(:R0VB  (:FK1XB  (:ITBBiTIOI  1) 

:iaiSTICKr  iSUSPBISIVB) 

(.•FBAMB  (:1TEUTIQI  S))) 

;  oHavt  6  la  intarnal  to  loop  and  aaad  not  bo  ebockod 
aoTO  CliAlli  BO 

■OTO  BO, 

DC  CFUT:tO 

■OTO  BO,  Cl, All 

<;LB  <:FBAXE  (sITBBATIOI  6)) 

(iPBAHB  (:BASB  4)) 

(iPBAHB  (:ITBBATIOI  4))) 
aoto  [6, All,  BO 

1#  BO,  t4,A3l,  BO 

■OTO  BO,  [4, All 

(sBBF  (sFBAME  (;ITBBATIOB  4)) 

(:LITEBAL  (:SnCBOL  sBIO-LOOP-S))) 

■or#  C4,All ,  BO 

bf  BO,  “aad.loop.B 

(:STPC  (:FBAIIE  (:BEZT-ITERATiai  0)} 

(;LZTBBAL  (.-SYMBOL  ;ZTB]UTB-B)) 

(:FBA]IE  (:BEZT-ZTEBATZai  0))) 

DC  ZBT:uakPC 

■OTO  1 ,  B1 

eaU  CBECE.ZTEB 

(;ADD  (:FBAIIB  (:ZTBBATZOI  B)) 

(iLZTEBAL  (:ZITBGEB  D) 

(:FBAKE  ( : BEZr-ZrEEATZOf  1))) 


■Ota 

C3,A3l,  Bl 

add 

El,  f raao_alaoTl«l ,  Bl  ;  nozt 

■OTO 

CB1,A21.  Bl 

add 

Bl,  1.  Bl 

;  offaat  1 

DC 

ZIT:'(aaakZH  1 

1  saakPC) 

and 

El.  EO.  El 

call 

LOOKUP 

■OTO 

CS.All.  EO 

add 

BO.  1.  BO 

;  latagar  1 

■OTO 

EO.  CB1,A3] 
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(:AOD  (imn  (:ITSUTIOI  2} 

:IOISTICir  sSUSPSISIVK) 
(:PU1II  (;ITniTIOI  E)) 

(:PKlia  (:mT-ITmTIOI  2))) 
rtBK  C3,A1].  to 

add  21.  2-1.  21 

call  LOOKUP 

mor»  [2.11].  20 

add  20.  [E.ll],  20 

■ova  20,  [21.12] 

DC  CFUT:tO 

■OTa  20.  [2,11] 


aoaa  ip.  20 

aora  20.  [0.13] 

■oaa  [1.13].  21 

Bova  21,  12 

atag  21.  IIT,  21 

•OTa  [2.13],  22 

add  22,  f raaa_aiaa>l ,  22 

•OTa  [22,12],  22  ;  Offaat  to  baaa  of  ear  loop  aaMriaa  aoa  in  22 

lab  22,  sys.laa_1>lta ,  22 

add  21,  22,  21 

atag  21.  1DD2.  21 

BOTo  21,  11  i  Baaa  of  cox  loop  anbfraao  in  11 

(;TST1  (:F21KZ  (;ITBK1TI0I  3) 

:IQI3TICKY  :SUSPEISIVB) 

(sFKlHB  (:IEZT-ITEB1TI0I  3))} 
rtag  [3,11],  20 

DC  CFUTjIO 

aoaa  20,  [3,11] 

•oaa  [2,13],  21 

add  21,  fxaaa.aisa'^l'*'! ,  21  ;  aazt 

Boaa  [21,12],  21 

add  21,  3,  21  ;  offaat  3 

DC  IIT:*(BaakIH  I  BaakPC) 

and  21.  20,  21 

call  LOOKUP 

BOTO  trna,  20 

•OTa  20,  [21,12] 


;  (:STIH  (sFKUB  (:P2BVI0US-ITEK1TI0I  0)) 

;  <:LIT821L  (;800LE1I  :TKUB)} 

;  (:F21]IK  (:P2BVI0US-ITE21TI0I  0))) 

DC  IIT:BaaUH 

BOTO  -1  ,  21 

call  CBECK.ITE2 

;  (:TE2XII1TE) 

anapand 

;  (:L1BBL  (:LITE21L  (iSYMBOL  :EID-LaOP-E))) 

and.loep.S: 

BOTa  [1,13],  21 

•OTa  21,  12 
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wta(  Kl>  IIT|  K1 


■OT*  C3.A33>  U 

add  U,  f saaa.alsc't'l ,  U 

■ova  CK3>i33>  *3  i  0«aat  to  baa#  of  ear  loop  anbfrasa  noa  in  B2 
lab  K3,  a7a_las_bita>  U 

add  Kl.  K3,  >1 

atas  Bl.  iOOB.  11 

Boao  11.  il  :  Baaa  of  ear  loop  aabfraaa  in  A1 

(;CfTT  (sFliiat  (sZmUTZOI  3> 

isosmsm) 

(:LITXUL  {:8TI!B0L  ;C0PT-L00P-?A1IABLB-1))) 

DC  '(eopj.loop_aa>iablo.l_Bag_raf> 

■OTO  3.  11 

eaU  CITT.L00P 


(:liOVB  (.PlAn  (.-iniATIOB  S)) 
(:P1A1B  (:BA3I  6») 


6.  11 

LOOEtIP 

CS.Al],  10 

10.  C6.A2] 

;  (:T»linATB) 

aaapond 

;  (;tABEL  <;LirEKAL  <!SY1IB0L  :C0Py-I.00P-VAlIABL8-l)>) 

eopp.laap.aaxiablo.l : 

■OTO  Cl.  A3],  11 

■oao  11 ,  A3 

atas  11.  IIT,  11 


aoa# 

add 

aoaa 

lab 

add 

atac 

■oao 


C3.A3],  13 

13,  f rana.aiaotl ,  13 

[13. A3],  13  i  Offaat  to  baao  of  ear  loop  aabfraao  noa  in  13 

12,  aya.lon_bita,  13 
11.  13,  11 
11.  AODl,  11 

II,  41  ;  Baaa  of  ear  loop  aabfraao  la  Al 


(:CITT  (.'FlAJIB  (.-ITBIATIOI  3) 
iSOSPEISIVB) 

(:LITE1AL  (:STIIB0L  !C0FT-LQ0P-VARIABI.E-2))> 


DC 

{eopy.loop.aarial 

■OT« 

3,  11 

cmll 

CIIT_I,00P 

mo'w 

ip.  10 

10,  CO. A3] 

aoTtf 

Cl.  A3],  11 

11,  A3 

11,  IIT,  11 

BOT* 

C3.A3].  13 

•dd 

13,  fraao.aiaaal 

BOT# 

[13.  A3],  13 

Ish 

13,  aya.laa.bita 

•dd 

11,  13,  11 

wtag 

11,  AOOl,  11 

mo'f 

11,  Al 

Offaat  to  baaa  of  ear  loop  aabfraaa  noa  in  P' 


Baaa  of  ear  loop  aabfraao  in  Al 
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:  (:W)VK  (;m]!B  (ilTBUTIOI  2) 

:  tIOISTZCIT  .-SUSPRISIVS} 

:  (:PUn  (:BiSK  7))) 


Ca.Ai],  to 

7,  tl 

eall 

LOOKUP 

■ov« 

Ca.Ai],  to 

to.  Cr.Aa] 

DC 

CFUT:tO 

aoT« 

to,  Ca.Ai] 

;  (inUnATB) 

■«sp«ad 

;  (:LABKL  (iLITBUL  (:STIIBQL  :COPT-LOOP-VlUiBLB-a))) 

eopy.loop.TBxlabla.a : 

■OT*  Cl >13],  tl 

mart  tl ,  13 

wttg  tt,  in.  tl 

aov«  C3.A31,  ta 

add  ta,  f raaa.aiaa-fl ,  ta 

■ora  Cta.Aa],  ta  ;  Oiift  to  baoo  ot  cur  loop  anbfraao  nor  in  ta 
lab  ta.  a7a_loa_bltB,  t3 

add  tl,  ta.  tl 

■tac  tl,  ADDt,  tl 

■ora  tl.  A1  ;  Baaa  of  ear  loop  anbfraao  in  A1 

;  (:H0VB  (:PtAMB  (iITBtATIQI  3) 

:  ilOISTICEY  ;SUSPBISIVB) 

i  (iPtAKB  (-.BASE  S))) 


rtmg 

C3.Aa],  to 

mor0 

8.  »1 

LOOKUP 

■OT* 

C3,A1],  to 

■QT« 

to,  CB.Aa] 

DC 

CFUT:$0 

uor0 

to,  C3.A1] 

;  (:ROVB  (.-LlTBtAL  (.-STHBOL  ;SIGIAL)) 

:  (iPtAIIB  (:BASB  B))) 

■ora  tmo,  tO 

■ora  to,  CB,A3] 

:  (:TKtIIZIATB) 

anapaad 

!  (:LABEL  (:UTBtAL  (: STHBOL  :Sq-3))} 

aq.3! 

■ora  Cl. A3],  tO 

■ora  to,  Aa 

;  (:K0VB  (:FKAHB  (:BASB  0) 

i  :SU5PBISIVE} 

:  (:rtAHB  (:IBZT-ITBRATIOI  3))) 

rtag  CO,Aa],  to  ;  Chock  if  ralna  thoro 

■ora  Ca,A3],  13 


91 


•dd  Ut  f ,  U 

■OT*  [Ea.ial.  U 

add  U,  3,  31 

DC  ZIT; *(uaUH  I  uskPC) 

and  31 >  30,  31 

eaU  LOOKUP 

awva  [0,13],  33 

aeva  33,  [31,13] 

(iROTB  (:LZTBK1L  (:imGB3  D) 

(  (:r3m  (:mT-ITBKiriOI  1))) 

safe  31,  3-1,  31 

call  LOOKUP 

■OTS  1,  30 

■OTt  30,  [31,13] 

;  (:MOTB  (:LITSK1L  (:I1TBGEK  0)) 

;  (:FKm  (:IBXT-ITE31TI03  3))) 

add  31,  3-1,  31 

call  LOOKUP 

aoTS  0,  30 

■otra  30,  [31,13] 

;  (:TBKRmTB)) 

saspaad 

cad 

xaf  sq.3^g_r«<  ■  MSG:  < (sq,.3+C0DB.PLlCB)  «  sys.lca.fcits)  ♦  3 
raf  s(i_4_Bsc_raf  ■  MSG:  ((•q.4+CQDB.PLlCB)  «  ays.lan.feits)  ♦  3 

taf  eapy_loap_aariafela_l_msg_tafaMSG: ((eopy_laop_Tariafela.l*CODB_PLlCB)  <<  sys.lan.feits)  +  3 
ra<  eapy_loap_Tariafela_3_*sj_rafaHS6; ((eopy_locp.aaxiabla.3+C0D8_PLlCB)  <<  tya.las.bita)  +  3 
raf  itarata.S_asg.raf  ■  MSG: ((itarata.B+CODB.PLlCB)  «  aya.laa.feita)  ♦  3 
raf  loop.asg.raf  ■  MSG: ((itarata.S+CODE.PLlCB)  «  sys.laa.feits)  3 
raf  LOOP.CB  •  CB:  <(t^l+C0DB.PLlCB)«16)  ♦  total.fraaa.sisa 

plaea  prograa:_eoda ,  COOB.PLICB 

lafeal  TOP.PLICB  -  tSOO 

•0 

; ;  Top  lOTOl  coda 
aodola  top.eoda 

t {  Craata  tfea  fraaa 

i  First  sa  aost  alloeata  a  fraaa 

DC  {topl_raf> 

aoTa  30,  33 

DC  {loop_cfe> 

aeta  30,  31 

call  ILLDCITB 

top.l: 

aova  33,  13 

DC  FD:S600«10  ;  Hfeora  to  pat  raaalt 

aoto  30,  [0,13] 

aoTO  argaaaat,  31  ;  Irgaaant 

aoto  31,  [3,13] 

aoto  k.  31 

aoto  31,  [slotK,  13]  ;  K 

DC  MSG:  ((Sq.l-K;0DB_PLlCE)«ays.laa.bits)4'3 
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■ov*  -1 ,  U 

■•ad3  0,  to,  0 

■•Ada*  i2.  t3,  0 
■aapaad 

•ad_of  _eod« : 

bz  *«nd_o<_eod* 


•ad 

r«<  topl.raf  -  IP:  ((tep.l-»TOP_PLlCE)«sys_l«a_blts)  *  iBSOLOTB 

plao*  libzarp.ooda,  LIBRAKT.PLiCI 
plaea  tep.eod*,  TOP.PLICB 


•1..3 

■odol* 

erg  TOP.PUCI 
bnay.loop : 

br  ‘bnay.loop 

end 
i  »  0 

•0 

ip  -  ip:(T0P.PLiC8«aya_lon_bi«s)  *  iBSOLUTE 

;nnteb  foteb  nil 

;ontelk  rood  arlto  r0..r3 

iwnteb  road  orito  a0..a3 

;*ateb  orlto  ;  qnono 

;>ateb  fanlt  all 

break  orlto  tSOl 

saoparato  on 

brook  fault  fault.typo 

break  fault  fault .llait 

break  fault  S  ;  draaorr 
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Appendix  B 


MDP  Library  Code 


B.l  General  Library 


i  Thia  fila  holda  tka  library  for  VIDI  prograa  axaeutioa  aa  tba  J-aacbiaa. 
i  It  pata  it  all  ia  a  aodala  library.eoda. 

i  It  iaelndaa  aad  dafiaaa  aa  aaoaaaary  aad  loada  tba  ayataa  call  ractor 
i  aitb  tba  foUoainc  (i  for  inpnt,  o  for  eatpnt): 

:  ALLOCm  (0)  -  llloeata  a  fraaa  oa  tba  earraat  aoda  gitan  a  codablock 

;  kl  (i)  bolda  CB.  iddr  raaalt  till  ba  ia  &3  (o). 

;  LOOEUP  it)  -  Cbaeb  a  locatioa  in  a  frasa  bafora  writing  to  it 

;  ia  erdar  to  atart  any  aaiting  prooaaaaa. 

;  CEltAS]  (i)  bolda  data. 

:  cm  (3)  -  Saaa  aa  VIDI  CITT.  Kl  (i)  bolda  taat  locatioa, 

:  KO  (i)  bolda  HSG  to  ba  apavaad. 

;  CIUOC  (3)  •  illocata  tba  anabar  of  aorda  ia  XI  (i>  and 

i  rotara  tba  raaalt  aa  an  ADDK  ia  K3  (o) . 

;  LOQKUP.rTBK  <4)  -  Lika  LOOKUP,  bat  takoa  ita  offaat  froa  il; 

;  thaa  it  takaa  a  baaa  in  11  (i)  aad  aa  offaat  in  Kl  (i). 

;  CnCX.ITEX  (6)  -  1  nan  aalaa  for  aa  ID  ia  Kl  (i)  ia  pat  ia  tba 

;  firat  apot  of  11  (i)  and  atarta  ap  tba  loop  if 

;  tba  iaport  and  PC  flags  ara  sat.  For  aos,  no 

;  PC  fiald  iacladad  in  10.  Tbis  Bast  ba  fizad  ap. 

;  Tarioaa  faalt  boadlars  ara  also  daf iaad: 

!  CFUT  •  Kaplaca  aecassad  locatioa  aitb  info  aboat  earraat  coatinaation 

;  tbaa  anspaad . 

i  SKID  -  Coatiaao  aftar  anaaoidabla  delay. 

;  Idditioaally ,  aoaa  aatbods  aecassad  by  aoa-loeal  XSGs  ara  aappliad: 

:  LOCIL.HOVK  -  Taka  a  KSC  of  tba  fora; 

J  n 

;  llT:effsatl 

i  IIT : ralaal 

;  Bvaataally,  oaa  sill  ba  abla  to  sand  any  nnabar 

i  of  IIT, IIT  pairs.  Tbia  stores  tba  talaas  into 

;  tba  offsets  of  tba  specif iad  fraaa. 

;  LOCIL.GKTC  -  Takas  a  MSO  of  tba  fern: 

;  CB  to  allocate  frana  for 

S  FD  to  send  aaa  FD  to 

;  IIT  offset  ia  dasiaation  FD 

;  Locally  allocate  a  franc  and  aand  it  back 
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to  roqnootisg  nod*,  llao  otozt  op  coda  block 
OB  CBXzoBt  node. 


> 


iBbol  FUB.m  •>  $i00 
label  STACI.B1SS  -  liOl 

;  stack  space  till  bo  iiom  laOl  np.  taOO 
:  till  hold  the  first  fzso  location  (not  last  nsad) . 

;  I2CC  croatos  (frea  an  iDDX)  a  framm  descriptor  FD: 

:  31  ...  16  IS  ..  0 

i  <addz>  m 

;  there  <addr>  should  bo  right-shifted  four  to  bo  proporlp  placed. 

: 

•  Siailarlp,  a  eodobleek  ia  tppod  CB  and  is: 

;  31  ...  16  IS  ...  0 

.  <addz>  <frsBO  siso> 

* 

;  To  stansriso; 

:  6B7C:  CB  ->  lODB  (allocate.loe) 

:  IXCC:  iCDR  ->  FD 

;  KOTI:  FD  r  (IIT  z  AIY)* 

;  CBTT:  IDDk  ->  ADDR  (becanao  it  stays  on  soao  processor) 


; ;  J-aachina  constants 

inelndo  "/hoae/gn/ellens/Id/ht.adp" 

inelnda  "/hoae/gn/ellans/Id/netq.adp'' 

label  a7S_len_bits  >  10 

label  ABSOLUTS  -  (1«8) 

label  UICBECKED  -  (1«31) 

; ;  Constants  for  loops 

label  posPretiona  ■  0 

label  aaskPretioas  *  lOOOOff 

label  posCnrrent  ■  8 

label  aaskCorrent  >  SffOO 

label  poslezt  >  16 

label  aaaklezt  -  SffOOOO 

label  posZH  >  34 

label  aasklH  ■  l<<posIH 

label  posPC  <•  3S 

label  aaskpe  ■  l«posPC 

: ;  User-defined  tags 
tagnasM  8  "CB" 
tagnaae  9  "FD” 
tagnaae  10  "ISA” 

I  Systea  calls 

label  ALLOCATE  •>  0 

label  ALLOCATB.VECTOR  -  0 

label  LOOEUP  «  1 

label  LOOEUP.VECTOR  -  1 

label  CITT  >  3 

label  CITT.VBCTOR  •  3 

label  CALLOC  >  3 

label  CALLOC.VBCTOR  -  3 

label  LOORUP.ITER  ■  4 

label  LOOIUP.ITBR. VECTOR  -  4 

label  CBBCK.ITER  «  S 

label  CSBCI.ITER. VECTOR  -  S 

naaeteetor  ALLOCATE+33,  "Alloeste" 
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MMVaotor  LOOK0P'*'33.  "Loeknp" 

B*MV«eter  CITT>32 ,  "CfTT" 

XMMTaetor  ClIXOC't’33,  ”Calloe" 

]  Coututa  iot  ealloe 

;  For  bast  •ffleivney,  (ISTRUCT.Q.SIZB  -  i)  *  ISTRttCT.Q.SITRY.SIZS  -  0 
label  ISTRUCT.q.SIZB  -  8 
label  ISTRUCT.q.ElTRT.SIZB  -  3 

■odnle  library.eode 

terrible: 

bait  0 

br  ‘terrible 

:  la  ease  of  efat  faalt,  replace  CFUT  sith  eoatianatioa  info, 

:  Tjpa  cbeekiag  is  etnraed  offe  sban  tbis  internq)t  is  enteredi 
;  Bbea  so  gtt  bare.  [0(13]  either  eontsins  a  salid  NSO,  or 
;  it  eoataina  a  IP  sitb  p>l(  a>l 
f snlt.ef nt.loe : 

nose  RO,  IDO  ;  eeeee 

note  HIR,  R1 

i  It  tbis  point,  R1  bolds  address  to  store  pointer  in 
fanlt_efnt_noae_allocsted: 

;  allocate  a  triple  fron  stack 
DC  addr:FREB_PTR«S7a_lan_bits 

nose  RO,  11 

nose  [0,11],  R3 

DC  IRT:3«spa.lea_bits 

add  R2,  RO,  R3 

nose  R3,  [0,11] 

nose  R3,  11 

;  R3  and  11  nos  point  to  anptj  triple 

nose  IDO,  R3  ;  ***** 

f salt .cf at .nag.oksy : 

nos*  R3,  [0,11] 

nos*  13,  RO 

nos*  RO,  [1,11] 

nos*  [Rl.lO],  RO 

nos*  RO,  [3,11] 

stag  R3,  CFUT,  R3  ;  Vrit*  the  triple  to  sber*  R1  points 

nos*  R3,  [Rl,10] 

saspend 

;  R1  is  a  CB  sitb  inpat  info. 

i  result  sill  be  an  IDDR  in  R3.  Clobbers  r*Bist*ra  (except  13). 
allocste.loc: 

check  Rl,  CB,  R3 

bf  R3,  'terrible 

stag  Rl,  IBT,  Rl  ;  lot  strictly  needed 

and  Rl,  tffff,  Rl  ;  Get  six* 

DC  lDDR:FREB.PTR«S7S.l*n.bits 

nos*  RO,  11 

nos*  [0,11]  ,  R2 

Isb  Rl,  sys.lea.bita,  Rl  ;  Shift  sis*  count  into  place 
add  R3.  Rl ,  Rl 
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■OT*  Kl,  [0,11] 

vtBg  U.  ISDl.  U 

■ov*  tip.  Ip 

i  tf«  a««d  capport  tor  ROVS.  Th*  foiut  of  tlio  Mosaago  aloald  bo: 

1  FO 

;  IITioffsotl 

i  lIT:Talnol 

t  Tba  aaabor  of  Itoas  eaa  bo  dotormiaod  fxoa  tbo  Bossogo  boodor. 

:  It  aut  bo  1  (for  boo)  Tbls  also  ~nBa  la  aaeboekod  aodo. 
loealjaoor: 

aooo  [1,13],  11  ;  Pat  fraao  daaerlptor  into  11 

ebook  11.  ro.  10 

bf  10,  ‘torrlblo 

otag  11,  zrr,  11 

lab  11,  -10,  11  :  Shift  eat  aodo  aaabor 

lab  11,  aya_loa_bita,  11  ;  Sblft  it  lato  addroaa  poaitioa 

;  otag  11,  IDOl,  11 

BOTO  11,  11 

:  Fizat  (aad  oaly)  oord 
Bovo  [3,13],  11 

BOTO  [3,13] ,  13 

BOVO  [11,11],  10  :  Sava  to  aao  if  anything  oaiting 

BOTO  13,  [11,11] 

bo  10,  *loeal_Barr_dono 

;  Vo  Boat  roatart  a  coatiaaatioa  bocaaaa  10  <>  0. 

local .Boor.aozt. triple : 

BOTO  10,  11 

BOOO  111,  1" 

ooad  11 ,  0 

aaad  [0,11],  0 

aoado  [1,11],  0 

i  Vo  ooald  daalloeata  the  triple  around  bare 
BOTO  [3,11],  10 

baa  10,  "loeal.BOTr.nazt. triple 

loeal_BOor_dcae : 

aaapead 

;  Vhea  a  gate  ia  doaa,  it  laitiatea  a  aplit-pbaaa  tranaactioa 
;  (according  to  lannaeei’a  iajaetion).  It  aenda  a  aeaeage  to  the 


deaired  node  of  the  fora: 

<boador>  [0,13] 
C!B  to  alloeato  fraao  for  [1,13] 
FS  to  aoad  roaalt  aao  FD  to  [3,13] 
Offaat  oitbia  FD  [3.13] 


i  Tbo  job  of  loeal.gate,  aftor  allocating  apaco,  ia  to  notify  the 
S  eallor  aad  to  cat  the  fraao  in  Botioa.  For  obriona  raaaoaa,  it 
;  does  the  two  aubtaaka  in  that  order. 

loeal.gate: 

;  aat  np  for  UXOCITS  call 

DC  IP:  (loeal.gatc_l«aya.laa.bita)alBSOLUTE 

BOTO  10,  13 

Bora  [1,13],  11 

call  UXOCITE 

local.gate.l : 

;  Built  up  the  FD  and  aaad  it  back 
DC  {loeal.Borr.Bag.raf} 


97 


vtmg  13.  IIT.  13 

lah  13,  tO-a7S_l*aL_1ilta,  13 

aova  m,  11 

add  13,  11,  13 

atag  13.  n>.  13 

aandS  [3,13],  10.  0 

aand  [3,13],  0 

aaad3a  [3,13],  13,  0 

!  Sat  up  for  aathod  apaeifiad  by  coda  block 
aora  [1,13],  11 

wtag  11,  in,  11 

lab  11,  -la,  11  ;  SbUt  oit  low  blta 

lab  11,  aya_laa_bita ,  11 

add  11,  3,  11  :  Put  in  langtb  bita 

atag  11,  HS6,  11 

:  Tba  lODl  ia  atlU  ia  13 
■ova  111,  .'.o 

aaad  10,  0 

aaad3a  11.  13,  0 

auapaad 


;  f ault.aaad.loe  la  uaad  to  aait,  than  aa  aaad  aaaaagaa  too  faat. 
:  Tbit  routiaa  la  llftad,  aarbatia,  froa  Valdaaar'a  MS  tbaala. 

;  It  raqulraa  typa  ebacklag  to  ba  dlaablad. 
f ault_aaad_loc : 


flp,  10 

rot 

10,  •»,  10 

sub 

10,  1,  10 

rot 

10,  9,  10 

SMrt 

10,  flp 

SOTS 

fopO,  10 

aovs 

flp.  Ip 

;  Tbla  axpacta  11  to  bold  tba  offaat  froa  13. 

i  Only  11,  43, 

and  i3  ara  guaraataad.  Cbaeklag  : 

lookup.loe; 

•ott 

[11,43],  10 

chock 

10,  CFUT,  13 

bf 

13,  'tarrlbla  ;  Doubla  nrita 

bs 

10,  'lookup.doaa 

■oto 

III,  13 

lookup _aaxt ; 

awaa 

10,  41 

r  nd 

o 

a 

L.ond 

[0,41],  0 

aaada 

[1.11],  0 

aaada 

43,  0 

;  Daalloeata  trlpla 

■OTO 

[3,41],  10 

bus 

10,  'lookup .aaxt 

lookup.doaa : 

•OTO 

flp.  Ip 

;  For  CITT,  11  abould  bold  tba  offaat  of  tba  taat  locatioa, 
;  aad  10  abould  bold  tba  aaaaaga  aaaa.  it  laaat  for  aoa, 

;  tba  eoatlauatloB  till  ba  apaanad  to  tba  aaaa  aoda. 

;  Cbaeklag  abould  ba  off  (to  avoid  CFUT  faulta). 
eatt.loe: 
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;  Cheek  test  locetion 
;  Is  It  s  cm? 
i  If  aot,  se  eea  send 


MTS  [Ki,i2].  R3 

Cheek  U,  cm.  S3 

bf  S3,  *eatt_tend_it 

;  lastesd,  kej  it  oa  [Bl,i3] 

«OTe  SO,  IDO  :  Sate  HSG 

:  alloeate  a  triple  from  staek 
DC  addz:FSE5.PTS«sps_lea_hits 

■ote  SO,  11 
aote  [0,11] ,  S2 

DC  IIT:3«s7s.laa_hits 

add  S3,  SO,  S3 

aote  k3,  [0,11] 

;  S3  holds  base  of  triple 
aote  S3,  11 

;  Fill  la  triple 

aote  IDO,  SO  ;  Sestore  it 

aote  SO,  [0,11] 

aote  13 ,  SO 

aote  SO,  [1,11] 

aote  [SI, 13],  SO  i  Take  old  pointer 

aote  SO,  [3,11]  ;  Pnt  it  at  end  of  triple 

;  Store  pointer  to  nes  triple 

stag  S3,  CFUT,  S3  j  < - - — - 

aote  S3,  [SI, 13] 

aote  flp,  ip 

eatt.sead.it : 

aote  IIS,  SI 

tead3  SI,  BO,  0 

seade  13,  0 

aote  fip,  ip 

;  SI  holds  the  nnaber  of  words  requested. 

;  Sesalt  sill  be  an  IDDS  in  S3, 
i  Presertea  11  throngh  13. 
ealloe.loc: 

DC  lDDS:FSEB_PTS«S7s_len_bits 

aote  11,  S3 

aote  SO,  11 

aote  [0,11],  S3 

Ish  SI,  S7S_lea.bita,  SI  ;  Shift  sise  coant  into  place 
add  S3,  SI,  SI 

aote  SI,  [0,11] 

stag  S3,  IDDB,  B3 

aote  S3,  11 

aote  fip,  ip 

;  This  expects  SI  to  hold  the  offset  froa  11 . 

;  0al7  SI,  11,  13,  and  13  are  gaaranteed.  Checking  anst  be  off. 
lookap.iter.loc : 

aote  [Bl,ll],  SO 

;  check  SO,  Cm,  S3 

;  bf  S3,  ‘terrible 

bs  SO,  'looknp.iter.done 


;  Sava  It 


«OTa  il ,  R3 

MTa  13,  AO 

koTa  lit.  13 

looknp.itar.naxt : 

MTa  AO,  il 

sand  K3,  0 

sand  CO,il] ,  0 

sanda  [1,11],  0 

;  DasUoesta  tripla 

Kota  [2,11],  to 

bna  to,  *leo)cap.ltar_nazt 

aota  10,  13 

■eva  t3,  11 

looknp.itar.dona : 

awTa  <ip,  ip 

;  Cbaek.itar_loa  azpaets  tl  to  tSTa  tta  vslna  to  pnt  into  ID  [0,11]. 
;  It  novas  it  thara  and  starts  tha  loop  U  both  flags  sza  sat. 

;  It  savas  tha  addrass  zaglstars. 
ehack.itar.loe : 

DC  IIT:naslcIM  *  nashPC 

nova  tl,  [0,11] 

and  tl,  to,  tl 

aq  tl,  to,  t3 

bt  t3,  *ehack_itar.atart 

nova  fip.  Ip 


ehaek.itar.start : 


DC 

•Cloop.nag.raf} 

nova 

III,  tl 

sandl 

tl,  to,  0 

sand 

11,  0 

sanda 

[slotID,13],  0 

nova 

fip,  ip 

and 

faolt.vae.addr.pO  ♦  faalt.efnt  ■  IP:  ((LIBtltT_PLlCE+fanlt_cfnt_loe)«s7S.lan_bit8)  ♦  IBSOLUTE+UtCHECtED 
fanlt.vsc.addr.pO  *  faolt.sand  •  IP:((LIBtltY_PLlCB^anlt_ssnd.loc)«S7S_lan.bitB)  +  IBSOLVTE-i^UICHECKEO 
S7Beall.vae_addr  *  1U.0C1TB  -  IP:  ((LIBtltY.PLlCEVBllocata.loc)«B7s_lsn.bits)  *  ABSOLUTE 
S7sealI_vae.Bddr  *  LOOKUP  >  IP:  ((LIBKlKY.PLlCE4'looknp_lae}«S7S_Ian_bits>  *  IBSOLUTE+UICBECKED 
s78eall.vae.addz  *  CITT  -  IP:((LIBmY.PUC8>cntt.loe)«S7S.lan_bits)  +  IBSOLUTE+UICBECKED 
s7seall.vae.addz  +  CILLOC  a  IP:(aiBRlKT_PLlCE+calloc.loe)«S7S_lan_bits)  +  ABSOLUTE 
raf  local.novz.nsg.raf  >  HSO:  ((LIBRltT_PUCB+loeal.novz)«s7s_lan.bits)+U1CEECKE0+4 
zaf  local.gate.nsg.zaf  a  KSO:  ((LIBtlKY.PLlCB+local.gatc)«s7S_lan_bits)+4 
zaf  leeal.fateh.nsg.zaf  a  RSQ; ((LIBRlKY.PLlCB+local.fatch)<<s7S.laa.bits)+UICBECXED+4 
zaf  loeal.stoza.nsg.zaf  a  NSO;  ((LlBElKY.PLlCE+locsl.stoza)«s7S.lan_bits)+UICHECXEIH3 

a7seall.va6.Bddz  +  LOOKUP.ITBt  a  IP:((LIBtlKT.PLlCB+looknp_itsz.loe)«S7S.lan_blts)  +  IBSOLUTE+UICBECKED 
a7scall.vae.addz  +  CBBCK.ITBt  a  IP;((LIBURT.PLlCB+chack.itaz.loc)«S7S.lsn.bits)  +  ABSOLUTE 


Inelnda  "lotsosots.ndp"  ;  CTUTUtEa  for  stack 
PKBB.PTtallT :  STlCX.BlSE«S7S.laa.bit  a 
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B.2  I-Structure  Routines 


:  This  Is  s  ehsngsd  Tsrsloa  of  IstructS.adp  that  nsss  dlTTsrsat 
;  rsprsssatstioas: 

;  XXPTT  -  niai  CFOT 

;  WAITIIO  -  aon-nnU.  CFUT 

j  Dm  -  aoa-CFUT 

;  It  also  goss  thzoagh  loeal_BovT. 

:  Tho  Torut  ot  i-straetars  addrsssss  ars: 

;  Tho  low  16  hits  hold  tho  aoda  aaabor 

;  Tho  high  16  hits  hold  tho  addxoss  oa  that  aoda 

;  Z-atraotaxa  addzassos  ara  tjpod  TAGS,  which  will  bo  dofiaod 
;  to  ZTAO. 

;;  This  is  shat  cods  to  Toteh  aa  I-atractaro  coll  looks  liko: 

;  With  tho  poiatoc  (taggod  iat)  la  El  aad  tho  1-atraet  ofTaot  ia  E3, 

:  aad  tho  Traaa  of foot  ia  E3. 

l_foteh.eoda: 


de 

{aystoa_fsteh_asg_rof> 

saad30 

Bl,  BO 

;  Saad  aoda  aoabor,  haadar 

ssad30 

Bl,  B3 

;  Saad  ISA 

soad30o 

A3,  B3 

;  Saad  fraaa,  offset 

aaspsad 

;  Systaa  fatch  gats; 
i  Co, A3]:  HSG:<s7ataB-fotch> 

;  Cl, A3]:  IIT:<l-straetaro  addzass> 

;  C3,A3]:  IIT:<offsot  froa  i-straetaTo> 

{  C3,A3]:  rD:<fraaa  of  dost> 

:  C4,A3] :  IIT:<offsot  froa  fraBa> 

VAUIIG:  SEISITIVS  TO  BIT  CBAIGES: 

Spoeifieally,  assaaos  SYS.LEI.BITS  •>  10, 
; :  i  lUZ.IODBS  -  3-16 

aystasufotch: 


Cl. A3],  Bl 

Pat  ISA  ia  Bl 

Ish 

Bl,  -16,  Bl 

Slide  orar  address  portioa  to  dal  aoda  9 

Ish 

Bl,  10,  Bl 

Slide  iato  address  positioa 

Bl,  A1 

C3.A3].  B3 

Pat  offset  iato  B3 

B* 

B3,  Cl.Al],  B3 

If  it’s  graatar  thaa  appar  boaad... 

bt 

B3,  *i_arr 

. . .it’s  aa  error. 

CO.Al],  BO 

Pat  lower  boaad  ia  BO 

sab 

B3,  BO,  B3 

Sabtract  off  base 

It 

B3,  0.  B3 

If  it’s  lower  thaa  base... 

bt 

B3,  -i.arr 

...thaa  it’s  aa  error 

add 

B3,  3,  B3 

Poiat  past  two  boaads  words 

S0T9 

CB3,A1],  Bl 

Taka  itaa  ia  i-stroctaro  spot 

chock 

Bl,  CFUT,  BO 

bt 

BO,  'data.aot.prasaat 

;  If  wo 

the  data  aad  eaa  ratara  it. 

saadO 

C3,A3] 

lode  aaabar  of  dastiaatioa 

aaadO 

C4,A3] 

HSG  haadar  of  dastiaatioa 

saadSOa 

saspoad 

Cl. A3].  BO 

coat art,  walaa 

;;;  This  caso  haadlos  both  a  first  aad  aabssqnoat  store. 
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; ; :  It  •llooataa  a  txipla  for  a  liakad-liat . 
data_aot.praaast : 

;  If  aa  gat  hara,  [K3|All,  tlia  rafara&ca,  Ei,  Ealda  a  ofntcra. 
;  Qat  tripla 

DC  iDDE:FREB_PTa«aya_laiiL_bita 

■ora  EO,  A3 

DC  3«a7a_lan_blta 

Kota  [0,A3],  E3  i  Pat  atart  loeatioa  la  E3 

aota  E3,  A3  ;  AS  aoa  bolda  a  ptx  to  a  aaa  tripXa 

odd  E3.  EO,  E3  ;  Pat  aaxt  fraa  loeatioa  la  E3... 

aoTO  E3,  C0,A33  i  ...aad  tbaa  back  iato  fxao  ptr 

i  Store  13  iato  i-atraetara  loeatioa 
KOTO  13,  EO 

aota  EO,  CE3,A13 

t  Store  ia  the  folloaiag  order: 


Praaa  aaabar  of  daatiaatioa 
Offaat  a/ia  fraaa 
■art  ptr 


[3,13],  EO 

:  Fraaa  aaabar 

HQV# 

EO.  [0,13] 

[4.13],  EO 

;  Fraaa  offaat 

moT« 

EO,  [1,13] 

«1.  [2,13] 

;  laxt  ptr 

!  Spataa-atora  gate: 

;  Co, A3]:  XSC:<a7ataa-atora> 

;  [1,13]:  IIT:<i-atraetara  addratt> 

i  C3,i3]:  IIT:<effaat> 

;  [3,13] :  <data> 

a7ataa_atera: 

oota  [1,13],  El  ;  Pat  ISA  ia  El 

lab  El,  -16,  El  i  Slide  over  addraaa  portioa  to  dal  aoda  t 

lab  El,  10,  El  ;  Slide  iato  addraaa  poaitioa 

aota  El,  11  :  A1  BOW  bolda  aba  addraaa  of  baaa 

aora  [3,13],  E3  ;  Pat  offaat  iato  k3 

gt  E3,  [1,11],  E3  ;  If  it’a  greater  tbaa  appar  boond... 

bt  E3,  *i_arr  ;  ...it’a  aa  error. 

aora  [0,11],  EO  ;  Pat  lower  boaad  ia  EO 

tab  E3,  EO,  E3  ;  Sabtract  off  baaa 

It  E3,  0,  E3  ;  If  it’a  lower  tbaa  baaa... 

bt  k3,  'i.arr  ;  ...tbaa  it’a  oa  error 

add  E3,  3,  E3  ;  Poiat  paat  two  boaada  worda 

Bowa  [E3,ll],  El  ;  Taka  itaa  la  i-atractara  apot 

ebaek  El,  CFUT,  EO  ;  It  bad  batter  be  a  cfatara. 

bf  10,  *i_arr  ;  If  aot,  it’a  a  writa-twica  error. 

Bowa  [3,13],  E3  ;  Pat  data  walaa  iato  E3 

BOwa  E3,  [E3,ll]  ;  Store  it  iato  i-atractara 

DC  {leeal.aoTr.aag.raf} 

ba  El,  'aaada.doaa 

;  It  tbia  poiat,  El  bolda  baaa  of  aaxt  liokad-liat  aatX7. 
i  EO  bolda  the  local_BOTr_aag_raf . 

aaad.loop: 
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■OT*  11,  il 
s«Bd30  CO.il],  10 
••adO  to, 11] 

••Iid3*0  [1,11],  13 

■QT*  [3,11],  11 
bu  '■•ad.leop 

••nda.doaa: 

■upaad 

Oat  ef  bouds  or  doubla-arlt*  error. 
i.orr: 

bait  1 

•ad 


;  S*ad  aod«  1,  KSO  b«ad«r 
;  S«ad  FD 

;  S«ad  offset,  data. 
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B.3  Loop  Support 


: ;  Conatuits  fox  loop* 

labol  poaPrariona  ■  0 

labal  BaakPxavieaa  >  lOOOOff 

labal  poaCorraat  <■  8 

labal  BaakCaxxaat  ■  tffOO 

labal  poalaxt  •  18 

labal  MaUaxt  -  $ff 0000 

labal  poalM  •  34 

labal  BaakZIi  -  l«poaIX 

labal  poaPC  •  36 

labal  aaakPC  ■  l«poaPC 

;  Syataa  ealla 
labal  CIBCI.irn  >  6 


Bzpaeta  30  to  bata  tba  aalaa  (>aaUR  or  MakPC)  to  bo  ot*d  iato 
tha  ID  31  (a/-l)  off  fxoa  tba  currant  itaration. 

It  aoraa  it  tbara  and  atarta  tba  loop  if  both  flaga  ara  aat. 

It  aaraa  tba  addxaaa  ragiataxa. 

For  an  only,  ignora  nraparonnd 
ebaek.itar.loe : 


Tmat  aia. 


aov« 

[2,1:. J,  33 

add 

33. 

31.  33 

;  This 

aaqnanca  eonnarta  a 

32. 

[2,12],  33 

wtag 

33. 

IBT.  33 

nag 

33 

and 

33, 

[3.12],  33 

anb 

*2. 

33.  32 

;  To! 

I  can  do  anan  batta: 

It 

»2. 

[2.12],  33 

ntag 

»3, 

IIT,  33 

nag 

33. 

33 

and 

12, 

33.  33 

;  Vboopa,  anat  alto  conrar' 

anb 

32, 

[BlotI.13],  31 

S* 

31, 

-1.  31 

ntag 

IIT.  31 

nag 

»1. 

31 

and 

31, 

[nlotI,13],  31 

anb 

M, 

31,  33 

;  Oopa! 

:  abona  conTaxtad  k-: 

It 

*a. 

[alotl,13].  33 

wtag 

33, 

IIT.  33 

nag 

»3, 

33 

and 

M, 

33.  33 

It 

32, 

0.  33 

■tag 

33, 

IIT,  33 

nag 

33. 

33 

and 

33. 

[alotI,i3],  33 

add 

32. 

33,  33 

DC  IIT:BukIK  *  uskPC 

■ad  kl.  kO,  k3 

•4  k3.  kO,  k3 

M  k3,  *cli«ck_it*x_staTt 

aoT*  kl.  Ck3,13] 

BOT*  <ip>  ip 

ehack.ltar.stazt : 

Bot  kO,  kO  ;  Tbtb  o<f  flafa 

■ad  kl,  kO.  kl 

BOT#  kl.  Ck3.i3] 

■ab  k3,  fraBa.aiaa-*'! ,  kS 

DC  {loop.Bag.raO 

Bava  Ilk,  k3 

■aada  k3,  kO,  0 

■aad3a  A3,  k3,  0 

Baaa  tip,  ip 
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Appendix  C 


Source  Code 


C.l  Convert  Hybrid  to  Complex  J 


;;;  H«d«;Coaaon-Llspi  Paekaga; ID-COXPZLn;  Baa«:10 
i ; ;  kybrld-te-e j  ooararta  hybrid  coda  to  eoaplax  J-aacbiaa  coda . 
S;:  Tba  aaxt  atap  la  to  aaad  it  throngh  cj-to-aj  to  changa  it  to 
;;;  J-aachiaa  a-azprataioat. 

(la-packaga  * Id-eo^pilar) 

(dafcoi^llar-Bodala  coatart-hybrid-to-coaplaz-j  id-eo)qpilar 
(:lapat  Tad-iaatnctieaa  coda-block) 

(:faactioa  coarart-hybrid-to-ej) 

(;oatpat  vad-iaatraetioaa  coda-block)  ;  Thia  ia  a  lia 

i  (:ba<ora-faaotioa  procadora  flla-aaa-bafora-daf) 
i  (iaftar-faaetioa  proeadara  aaa-aftar-daf ) 

;  ( :  arappar-aaero  rnd-f ila-aaaa^lar-arappar) 

;  (:optioBa  iapat-fila  Tad-oatpat-<ila  tad-oatpat-fila-foraat) 

) 


> ; ;  J-aaehiaa  eoaatanta 

Originally,  thaaa  aara  aoabara.  Thay  ara  aora  raadabla  aa  ayabola  and 
;;;  can  ba  raplacad  by  NSPSia.  Tha  eonataata  ara  aaadad  to  knoa  if  thay’ra  okay  litarala. 
(dafeonataat  *aya-laa-bitaa  10) 

(dafconatant  aya-tag  ’aya) 

(dafeonataat  aya  0) 

(dafeonataat  iat-tag  *lat) 

(dafeonataat  iat  1) 

(dafeonataat  fd-tag  ’fd) 

(dafeonataat  fd  9) 

(dafeoaataat  boolaan-tag  ’bool) 

(dafeonataat  bool  3) 

(dafeonataat  addr-tag  ’addr) 

(dafeonataat  addr  3) 
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:  For  UFs  U4  ST1O0U  (T) 

(dafeoastut  apacial-tag  ’spacial.tag) 

(dafeoutut  apaeial.ta^  33) 

(dafooaataat  alloeata-aaetor  >alloeata.aaetor} 

(dafeoaataat  allocata.Tactor  0) 

(dafeoBstaat  loekap-aactor  ’looknp.Tactor) 

(dafcoastaa*  leokap.aactor  1) 

(dafeoaataat  eatt-Taeter  *eatt_aaetor) 

(dafeoastaat  aaat.aactor  2) 

(dafeaaataat  eatt-loop-Taetor  'eatt.loop-Taetor) 

(dafceastaat  eatt.loep-Taetor  3) 

(dafeoaataat  ealloe-vaetor  'ealloe.Taetor) 

(dafeoaataat  callee.aaetor  4) 

(dafeoastaat  ehack-ltar-aaetor  'ekaek.ltar-Tactar) 

(dafeoaataat  ehaek.itax-Taetor  E) 

(dafeoaataat  apoalMa  34) 

(dafeoaataat  *poaPC*  36) 

(dafeoaataat  aaiaaklMa  (azpt  1  apoalK*)) 

(dafeoaataat  amaakPCa  (axpt  1  *poaPC*)) 

(dafaa  eea*art-kybrld-to-oj  (eb) 

(lata  ((ej-iaatraetioaa  (oeaTart-hybrid-to-ej-ianar  (datafloa-grapk-root-aat  cb) 

(dataflov-grapb-gat  cb  rframa-daacrlptor)))) 
(aatf  (datafloa-grapb-root-aat  eb)  ej-iaatrnctioao)) 
eb) 

(dafaa  eeatart-kybrid-to-e j-iaaax  (laatraetioaa  fraaa-daae) 

(if  (aall  laatraetioaa) 
ail 

(lata  ((iaatraetioa  (ear  laatraetioaa)) 

(opcode  (ear  iaatraetioa)) 

;  Gat  rid  of  hybrid  ragiatar  rafaraaeaa  —  oaeh 

(oparanda  (aapear  f ’tranafoxa-hybrid-ragiatar  (copy-liat  (edr  iaatraetioa)))) 
(aaapaaaira-eoda  (aatata-aaapaaaiTa-oparaada  opcode  oparaada)) 

(fa  (eoBTart-opeoda-to-fa  opcode))) 

(if  (aoll  fa) 

(ay-arror  :fatal  ail  (foraat  all  “lo  opcode  for  foactloa  *5"  opcode))) 

(appaad 

' ((hybrld-iaatraetioB  .iaatraetioa)) 

aoapaaa iTO-eoda 

(apply  fa  fraaa-daac  oparaada) 

(eeaTart-hybrid-to-cj-iaaar  (edr  iaatractioaa)  fraaa-daac))))) 

(dafrar  aeoararaioa-liata) 

(dafaa  coaTart-opeoda-to-fa  (op) 

(edr  (aaaoc  op  aeoaTaraloa-liat*))) 

; ;  Vary  iaafficiaat 

(dafaa  traaef ora-hybrid-ragiatar  (op) 

(if  (oad  (liatp  op) 

(aq  (car  op)  :raglatar) 

(aoabarp  (aaeoad  op))) 

‘(:taapoxaxy  (:baaa  , (aaeoad  op))) 
op)) 
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(dafuere  ■uapanaivap  (operand) 

'(aoabor  :s«aponairo  , operand)) 

ii;  A  fen  honra  with  tkia  aeotion  eonld  yield  aoae  anjor  optiaiaationa, 

;;;  not  to  aention  nKat  eonld  be  done  with  reglater  allocation. 

(defnn  aatate-anapenaiTO-operanda  (opcode  operanda) 

(let  ( (anepenaiea-code  (antate-anapenaite-operanda-inner  operanda))) 

;  apacial 

(if  (not  (eq  opcode  :eoatinae-teat)) 

(if  anapenaine-eode 

(cona  ’(anapenaine-inatmetion) 

S  raaota-dnplicatea  to  enanre  only  one 
I  ebook  for  (ladd  (lanapenaine  1)  (tanapenaino  X)  Y) 

(append  (reaove-daplieatea  enapenaive-eode  :teat  t’eqnal) 

*  >((anapenaine-ebeek-done)))))))) 

(defnn  antate-anapenaine-operanda-inner  (operanda) 

(if  (nnll  operanda) 
ail 

(append 

(if  (anapenainep  (ear  operanda)) 

(progn 

(aatf  (ear  operanda)  (raaoee  tanapenaive  (car  operanda))) 

< ((anapenaina-operand  .(ear  operanda))))) 
(amtata-anapanaiTa-operaada'inaer  (edr  operanda))))) 

(dafnar  •eonveraion-liate) 

(aatq  aeoaneraion-liate  nil) 

(dafawero  dafeonveraion  (bybrid-naae  bybrid-ayabol  operanda  body) 

(progn 

(aatq  econtaraion-liat* 

(eona  (eona  bybrid-ayabol  bybrid-naae) 
oeonreraion-liat*)) 

(let  ( (fnll-op-liat  (eona  ’fraae-deae  operanda))) 

* (defnn  .bybrid-naae  .fnll-op-liat 
*fraae-deae 
.body))))  — 

(defnn  fraae-baaa-offaat  (operand) 

(if  (aq  (ea  operand)  :fraae) 

(baaa-oxfaet  operand) 

(error  :fatal  ail  "Illegal  operand  anpplied  nben  fraae-baae  ealne  expected."))) 
Uead  by  ej-to-aj 

(defnn  aeaaaga-baaa-offaet  (operand) 

(if  (eq  (ear  operand)  ;aeeaage) 

(baaa-offaat  operand) 

(error  :fatal  nil  "Illegal  operand  anpplied  nben  aeaaaga-baae  Talne  expected."))) 

(defnn  baaa-offaet  (operand) 

(if  (eq  (ear  (aeeond  operand))  :baae) 

(aeeond  (aeeond  operand)) 

(error  :fatal  nil  "Illegal  operand  anpplied  nben  baae-offaet  ralna  expected."))) 

(defnn  literal -baae-offaet  (operand) 

(if  (and  (aq  (ear  operand)  ;litaral) 

(eq  (ear  (aeeond  operand))  :baaa)) 

(aeeond  (aeeond  operand)) 

(error  :fatal  nil  "Illegal  operand  anpplied  nben  literal-baae  Talne  expected.")}) 

(defeonTaraion  gate  :gat-eontazt  (contaxt-alot  retnm-elot'> 

‘((reaerre  (:regiatar  acrateb}) 
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(■OT*  (:j-r*|i>t«T  A3)  (;z«gi*t*r  scratcK}) 

(«t>g  (:z*gist«s  scratch)  (illtsral  ,iat-tag)  (:ragistar  scratch)) 

(lah  (:rsglstsr  scratch)  (;litsral  ,(-  10  *s7s-lsa-hits«))  (:rsgistsr  scratch)) 
(rsssrrs  (:rsgistar  scratchS)) 

(aoTS  (:j-rsglstsr  llh)  (:rsglstsr  scratch3)) 

(add  (:raglstsr  scratch) 

(iragistsr  seratchS) 

(:rsgistsr  scratch)) 

(fras  (: register  scratchS)) 

(stag  ('.register  scratch)  (:literal  ,fd-tag)  (:ragister  scratch)) 

(seadO  (: literal  1)) 

(seadO  (:ref  local.getc)) 

(seadO  ,eoatazt-slot) 

(seadO  (:register  scratch)) 

(free  (:rsgister  scratch)) 

(aeadeO  , «raae-base-etf set  retara-slot)))) 

; i ;  Soaethiag  should  be  doae  to  haadle  f alliag  lato  a  loop 
(de^coareraioa  label  : label  (label -aaae) 

'((label  ilabel-aaae) 

(aere  (:aessage  (:base  1))  (:j-register  A3)))) 

(defaa  lookup-iato  (deat) 
in  (sq  (ear  dest)  :fraae) 

'((aore  (:literal  , (fraae-bass-olf sat  dest))  (:j-register  RD) 

(call  (:litszal  ,lookap-Taetor))))) 

; i  For  aov,  ao  loops 

(defeoararsioa  aore  :aoTe  (soaree  dest) 

(appead  (looknp-iato  dest) 

'((aote  .source  .dest)))) 

(defeoaTersiea  aore-ideatitj  saoTa-ideutitj  (source  deat) 

(appead  (lookup-iato  dest) 

'((acre  .source  .dest)))) 

(defeoarersioa  catt  : e oat iaue -test  (check-slot  coat) 

;  Coarert  it  horn  (:literal  (isyabol  :SQ-t))  to  (:ref  :Sq-l) 

'((aore  (:ref  .(saeoad  (secoad  coat)))  (:j-ragistar  RO)) 

(aore  (:literal  . (iraae-base-off sat  check-slot))  (sj-register  Rl)) 

(call  (iliteral  .eatt-ractor)))) 

(defeoareraioa  eata  .'eoatinae  (coat) 

'((seadO  (:j-ragistar  IIR)) 

;  Coarart  it  froa  (:literal  (:ajabol  :Sq-l))  to  (:ref  :Sq-l) 

(seadO  (:ref  .(secoad  (secoad  coat)))) 

(seadeO  (:j-register  A3)))) 

(deleoarorsioa  aorr  :aoTo-reaota  (fraao-ptr  offset  ralue) 

'((seadO  .fraaa-ptr) 

(seadO  (:ref  loeal_aorr)) 

(seadO  .fraae-ptr) 

(seadO  .offset) 

(seadaO  .ralae))) 

; ; ;  This  should  set  a  flag 
(defeoaTeraioa  taraiaata  ctazaiaata  () 

' ((suspead))) 

(defeoaversioa  la  :<•  (si  s3  d) 

(appead  (lookup-iato  d) 

'((le  .si  .S3  .d)))) 
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(d«feesv«nloa  It  :<  (si  >3  d) 
(mpp«ad  (leokup-into  d) 

•((It  ,•!  ,•2  ,d)))) 


(dafeoavanloa  gt  :>  (al  b3  d) 
Cappaad  (looknp-iato  d) 

‘((*t  .al  .a2  .d)))) 

(dafeeataraioa  ga  :>■  (at  aS  d) 
(appaad  (leokap-iato  d> 

<(((a  .al  .a3  ,d)))) 

(dafeoavaraioa  j-aq  :■  (al  a3  d) 
(appaad  (lookap-lata  d) 

<((aq  .al  .a2  .d)))) 

(dafeoaaaxaloa  J-aaq  :<>  (al  a3  d) 
(appaad  (lookap-tata  d) 

*((aaq  .al  .a3  .d)))) 

(dafcoararaloa  j-aaq2  :/■  (al  a3  d) 
(appaad  (laoknp-iato  d> 

*((aaq  .al  .a3  .d)))) 

(dafeeaTaxaioa  j-aad  ;aBd  (al  a3  d) 
(appaad  (looknp-iato  d) 

*((aad  .al  .a3  .d)))) 

(dafeoavaraloa  j-or  :or  (al  aS  d) 
(appaad  (looknp-iato  d) 

‘((or  .al  .02  .d)))) 

(dafeoataraioa  j-anb  :-  (al  a3  d) 
(appaad  (looknp-iato  d) 

‘((anb  .al  .a2  .d)))) 


(dafeeaToraioB  j-add  (al  a3  d) 
(appaad  (looknp-iato  d) 

‘((add  .al  .a3  ,d})}) 

(dalcoataxaioa  j-«nl  :•  (al  a3  d) 
(appaa4  (looknp-iato  d) 

‘((■nl  .al  .a3  .d)})) 

(dafeoaToraiOB  j-aot  :aot  (a  d) 
(appaad  (looknp-iato  d) 

‘((not  .a  .d))}) 


(dafeoBToraiaa  j-aba  :aba  (a  d) 

(appaad  (looknp-iato  d) 

‘((xaaaxta  (ixagiatar  aeratcbl)) 

(xaaaxta  (iragiatar  tcxatcb3)} 

(aak  .a  -31  (:xagiatax  aeratcbl)) 

(xor  .a  (:ragiatar  aeratcbl)  (:rajiatar  aeratcbl)) 

(anb  (:ragiatar  aeratcbl)  (iragiatar  ocratebl)  .d) 

(fraa  (-.ragiatar  aeratcbl)) 

(fraa  (: ragiatar  aeratebl))))) 

(dafeoataraion  j-aar  :aaz  (a  b  d) 

‘((roaarra  (;rogiBtar  aeratcbl)) 

(appaad  (looknp-iato  d) 

(raaorta  (: ragiatar  aeratcbl}) 

(ga  .a  ,b  (; ragiatar  aeratcbl)} 

(atag  (cragiatar  aeratcbl)  .iat-tag  (iragiatar  aeratcbl)) 


a  >»  b 
Rl:  T 
Rl:  1 


I  a  <  b 
I  Rl:  r 
I  Rl:  0 
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(Mg  (: register  aeretehl)  register  scrstekl))  ;  Rl:  -1 

(sad  (:ragister  serstehl)  ,s  (:regiater  aerstehS))  ;  &a:  a 

(aet  (:ragister  scratelil)  (:register  seratchl))  ;  El:  0 

(sad  (:registar  scratelil)  ,b  (iregister  seratebl))  ;  El:  0 

(or  (iregiater  scrateba)  (:ragistar  scratebl)  ,d)  ;  a 

(free  (iregister  seratchl)) 

(free  (: register  scrstcha))))) 


(defcoatersioB  j-aia  laia  (a  b  d) 

(appead  (looknp-iate  d) 

’((reserTs  (: register  seratchl)) 

(raserTS  (: register  scrstcha)) 

(gs  ,s  ,b  (iregistar  seratchl)) 

(stag  (:register  seratchl)  ,iat-tag  (:ragister  sciatehl)) 

(asg  (iregistar  aeratehl)  (:registar  seratchl)) 

(aad  (:register  aeratehl)  ,b  (:registar  scrstcha)) 

(aot  (iregistar  seratchl)  (tregistar  aeratehl)) 

(aad  (:regiater  seratchl)  ,s  (:register  seratchl)) 

(or  (iregistar  scrstcha)  (iregistar  seratchl)  ,d) 

(free  (iregistar  seratchl)) 

(free  (iregistar  scrstcha))))) 

; ;  lot  used 

(daf coBTorsion  loop-setap  iloop-setap  (Isbel-naae) 

‘ (let  (frsBe-sise  (fraae-descriptor-frana-sise  fraae-desc) 

(k-slot  (eoBpnta-alot 'Offset  t  lasxiana-iteratioas)) 

(slots-per-itaratioB  (fraae-descriptor-Baxt-STailsbla-iterst ion-slot  frame 
(loop-aatBp-lsbel  (gansyB  ’loop-loop))) 

*((de  (iliteral  ,(*  fraae-sisa  (erpt  a  eays-leB-bits*)))) 

(more  (ij-ragiatar  ia)  (ij-ragistar  Ea)) 

(stag  (ij  register  13)  ,int-tsg  (ij-register  13)) 

(add  (:J-register  13)  (i j-ragister  EO)  (i j-register  E3)) 

(stag  (ij-regiater  Ea)  ,addr-tsg  (: j-register  Ea)) 

(aote  (ij-register  Ea)  (:j-register  ED) 

(aote  (ifraae  (ibsse  , k-slot))  (ij-ragister  El)) 

(sob  (ij-regiater  El)  (iliteral  1)  (ij-register  El)) 

(aoTo  ,(e  a  fraae-sisa)  (ij-ragistar  E3)) 

(add  (ij-register  E3)  (ifrsaa  (ibaae  , k-slot))  (j-ragister  E3)) 

(de  (iliteral  eaaaklK*)) 

(or  (ij-registar  E3)  (ij-regiater  EO)  (ij-register  13)) 

(aote  (iliteral  1)  (ij-register  E3)) 

(label  iloop-setnp-label) 

(aore  (ij-ragistar  13)  (ifraaa  (iloop  11))) 

(It  (ij-registar  E3)  (i j-register  El)  (i j-register  EO)) 

(add  (ij-register  E3)  (iliteral  ,fraae-B-iteratioBs)  (i j-register  Ea)) 

(add  (ij-regiater  E3)  1  (: j-register  13}) 

(bt  (ij-registar  EO)  .loop-setBp-label) 

(de  .(lognot  (logior  easskIH*  easskPC*))) 

(aad  (ij-ragiatar  E3)  (: j-register  EO)  (: j-ragister  E3)) 

(aora  (ij-ragistar  Ea)  (ifrsaa  (iloop  (ij-ragister  E3}))) 

(aoTo  (ij-ragistar  Ea)  (fraaa  (iloop  0))) 

(aoTO  (ifraae  (iloop  1))  (: j-register  Ea)) 

(sad  (ij-registar  E3)  (: j-register  EO)  (: j-register  E3)) 

(add  (ij-register  E3)  1  (ij-register  E3)) 

(aOTS  (ij-register  Ea)  (ifraae  (iloop  (ij-register  E3))))))) 

;;  from  (iliteral  (isyabol  iSQ-l))  to  (ilsbel  iSq-1) 


a  >e  b 

El:  T 
El:  1 
El:  -1 
Ea:  b 
El:  0 
El:  0 
b 


Ell  0 
Eai  0 
El:  -1 
sai  b 

b 


a  <  b 
Ell  F 
Ell  0 
Ell  0 
S3:  0 
El:  -1 
Ea:  a 


desc)) 
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(d*fiui  eo>iT*rt-lBb«l  (1) 

‘(:tagg«d-llt»r»l  ispceial-tag  (:lab*l  .(••coad  (■•coad  1))))) 

(d«fcoaT«raioa  ;liruich-~^al.a«  (>1  >3) 

‘((M  ,cl  ,(eoaT«rt-lab«l  *3)))) 

(d«f coaT«r*ioa  brt  ibraacb-tta*  (»1  »3) 

‘((bt  ,ti  , (coBT*rt-lab«l  aS)))) 

(daf coaaarsioa  bra  :braach*xaro  (si  s3) 

<((ba  ,st  ,(coaTart-l>b«l  aS)))) 

(dalooaTarsioa  braa  :braaeb-aot-Baxo  (al  a3) 

<((bax  ,al  .(eoavart-labal  b3})}} 

(daifeaBTaraiaa  br  :braacb  (si) 

‘((br  , (eoaTcrt-labsl  si)))) 

(dsfeoararaloa  ixec  :  iadax-earraat-coatart  (fraaa-basa  daat) 

(appaad  (lookap-iato  daat) 

‘((rasaraa  (iragiatar  serateh)) 

(■oaa  (:j-ragiatar  13)  (.-ragistar  scratch)) 

(stag  (.-ragistar  scratch)  (tlitaral  ,iat-tag)  (rragiatar  scratch)) 

(add  (:ragistar  scratch) 

(:litaral  ,(•  (literal -baaa-o<fsat  iraaa-baaa) 

(azpt  3  asys-laa-bivs*))) 

(; ragistar  scratch)) 

(Ish  (:ragistar  scratch)  (-.litaral  ,(-  16  a,,a-laa-bits*))  (.-ragistar  scratch)) 
(add  (iragistar  scratch)  (jJ-ragistar  HH)  (;ragistar  scratch)) 

(stag  (jragistar  scratch)  (slitaral  ,<d-tag)  (:ragistar  scratch)) 

(■esa  (.-ragistar  scratch)  ,dast) 

(fraa  (:ragistar  scratch))))) 

; ;  Thasa  ara  okay  bacaasa  tha  oparaads  sill  ba  aaspaasisa 
i j  and  caught  by  amtata-suspansiTa-oparand. 

(dafeoasarsioa  tstS  !tast-3  (si  s3  dast) 

(appaad  (looknp-into  dast) 

‘((■osa  (jtaggad-litaral  ,boolaaa-tag  1)  .dast)))) 

(dsfeoaTarsioa  tstl  stast-1  (si  dast) 

(appaad  (lookap-iato  dast) 

‘((■Ota  ( ;taggad-litarBl  ,boo\aan-tag  1)  .dast)))) 

(ds^coatarsioa  ststl  :spacial-tast-l  (si) 

‘ ((suspsasita-instruction) 

(suspaasiTa-oparand  .si))) 

(dafeoararsioa  rate  :ratura-coataxt  (source  dast) 

(appaad  (lookup-into  daat) 

‘((■Ota  (: tagged-literal  .boolean-tag  1)  .dast)))) 
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C.2  Convert  Complex  J  to  Simple  J 


;;;  Rod* : Conoa-Lisp ;  Package :ID-COMPIlSa;  Basa:10 

ill  cj-to-sj  conTarta  coatplax  J-aachiaa  coda  (as  produced  bp  bpbrid-to-cj) 

;;;  into  J-uckina  s-axprassions.  The  s-axprassions  sill  correspond  on  an 
exact  ona-to-ona  basis  sitb  J-Baebina  instmetions.  Tba  final  step  is 
;;;  to  sand  it  throng  sj-to-j,  in  tba  file  of  that  naaa. 

Conplax  j-aaebina  coda  differs  from  j-naebina  coda  in  sasaral  saps: 

St* 

;;;  Z  it  tba  beginning  of  asarp  possiblp  saspaasisa  instmetioa, 

;  i ;  ( :  snspaasisa-iastmet  ioa) 

;;;  appears.  For  aaeb  possiblp  saspansiss  operand, 

;;;  ( :saapaasisa-oparaad  <oparand>) 

Tbasa  aaat  be  eoasartad  to  appropriate  coda. 

s  •  s 

;;;  Z  (:rasarsa  <spBbol>)  and  (:fraa  <spBbol>)  axe  asad  to  bind  tba  salaa 
;;;  of  the  spabol  so  that  (:rsgiatar  <spBbol>)  is  Baaningfal.  Tba 
; ; ;  asaga  la  of  tbs  f oxb: 

;;;  (irasarsa  (:ragistsr  serateb)) 

;;;  (:BOTa  ( : j-ragistsr  il)  (:rsgistar  scratch)) 

;;;  (:fraa  (;ragistar  scratch)) 

;;;  Tba  asaga  is  parposalp  sarboss,  to  alios  a  ebanga  of  raprasantation, 

;;;  as  sail  as  arror-ebaeking.  (RasaxTiag  a  second  ragistar  of  tbs  ssma 
;;;  naBS,  aaing  a  aonxasaxsad  ragistar,  and  freeing  a  aonrasarsad  ragistar 
;;;  are  all  errors.) 

t  I  • 

ii;  Z  Specific  ragistar  asBaa  are  danotad  sitb  :j-ragistar,  i.a.  (;j'rsgistar  ’iS). 

;;;  Tba  enlp  tiaa  specific  SPSs  are  asad  is  to  sat  ap  for  CiUs.  This  is 

;;;  alaost  eartainlp  a  siolatioa  of  abstraction.  This  is  a  soazes  of  potential 

;;;  bags  as  sail  if  this  Bodala  trashes  those  registers. 

s  $  i 

;;;  Z  lo  considsratioa  is  swda  sbathar  tba  oparatiea  can  fit  in  one  J-instractioa. 

;;;  la  Banp  cases,  it  cannot.  For  axanpla,  tbit  is  a  lagal  ej  instraction; 

;;;  (.'add  (:fraaa  (:basa  0)) 

;;;  (: literal  83933) 

(:fraaa  (:basa  9))) 

9  I  * 

;;;  Z  Tbsra  are  both  :litaxal  and  : tagged-literal  operands. 

;;;  Tba  register  allocation  is  correct  and  stable,  to  tba  bast  of  ap  knosladge. 

;;;  It  Is  non-optiaal  bat  acceptable. 

(In-packaga  ’id-eo^ilar) 

;;;  For  soaa  reason  that  I  can’t  figaro  oat,  I’b  basing  txoabla  getting  naq. 

(dafaacro  naq  (a  b) 

‘(not  (aq  ,a  ,b))) 

(dafcoapilar-Bodala  consart-coaplax- j-to-siapla-j  Id-coapiler 
(:lnpat  snd-intt  met  ions  coda-block)  ;  1  lie 

( :bafoxo-fnnctlon  procadara  xaaat-c j-to-s j-tpstaa) 

(:fnaction  consart-c j-to-sj) 

(:oatpat  md-instractions  coda-block))  ;  Yack!  I’sa  got  to  fix  tbasa  abstractions 

(dafan  rasat-c j-to-s j-spstaB  () 

(satq  aj-instractlonsa  nil) 

(satq  asirtaal-ragistars*  nil) 
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(■•tq  •trac-raglstcr-list*  g«iil-parpo««-r*g*)> 

(4«fiui  ■/••rror  (•  b  o) 

(pziat  e) 

(br«*k) 

(•rrer  •  b  c}) 

;;;  Tb«s*  az*  fnnctioas  to  spoeify  basic  J-aaebino  cbazactazistics. 

(dofan  aaks-taggod-litozal  (1) 

(eoBd  ((aaabszp  1)  * (:taggad-litazal  ,iat*tag  ,1)) 

((zafazaaeap  1)  ‘ (:taggad-lltazal  ,spaelal-tag  ,1)) 

((aq  (eaz  1)  :taggad-litazal>  1) 

:  Ceavazta  fzoa  (:labal  (rlltazal  (isyabol  :loebaz)))  to 
;  (itaggad-litazal  apaeial-tag  (:labal  (:sjabal  :foabaz>>> 

((aq  (car  1)  ;labal) 

(list  :taggsd-litszal  spaeial-tag  (list  :labsl  1))) 

;  (list  itaggad-litazal  spaeial-tag  (list  :labal  (sacoad  (sacoad  (saeoad  1}))))} 

((aq  (eaz  1)  :litazal) 

(if  (listp  (saeoad  1)) 

(if  (aq  (eaz  (saeoad  D)  :iatagaz) 

* (:taggad-litazal  ,lat-tag  .(sacoad  (saeoad  1))) 

(list  itaggad-litazal  spaeial-tag  (sacoad  1))) 

(list  :taggad-litazal  iat-tag 
(if  (listp  (sacoad  1}) 

(if  (aq  (caz  (sacoad  1))  :iatagaz) 

(sacoad  (sacoad  D) 

(■y-azzoz  :fatal  ail  “Illagal  fozaat  of  litoral")) 

(saeoad  1))))) 

(t  ail))) 


; i  Oaly  eoarazts  if  appzopziata 

(dafaa  Mka-taggad-litazal-if-appzopziata  (1) 

(lot  ((rasalt  (maka-taggad-litazal  1))) 

(if  rasalt 
zasalt 
1))) 

(dafon  baz-ralna  (h) 

(eond  ((aad  (>-  b  tXO)  «-  b  9\9))  (-  b  tVO)) 

((aad  (>■  b  *\i)  «a  b  #\F))  (♦  10  (-  b  tU))) 

((and  (>-  b  f\a)  «■  b  #\f))  (♦  10  (-  b  «\a))}>) 

(dafaaezo  baz-to-dac  (b-stziag) 

(do  ((eoaat  (-  (laagtb  b-striag)  1)  (-  conat  1)) 

(rains  0  (♦  (a  ralna  16) 

(baz-ralna  (ebaz  b-striag  conat)})}) 

((<  conat  0) 
ralna)}) 

(dsfcoastaat  opO-litsrals  (list 

(cons  sja-tag  0)  ;  nil 

(cobs  boolaaa-tag  0)  ;  falsa 

(eoas  boolaaa-tag  1)  ;  tma 

(cons  int-tag  (baz-to-dac  "80000000")) 
(cons  int-tag  (baz-to-dac  "ff")) 

(eons  iat-tag  (baz-to-dac  "3ff")) 

(cons  iat-tag  (baz-to-dac  "ffff")) 

(coas  iat-tag  (baz-to-dac  "fffff")))) 

(dafna  opO-litaral-p  (1) 

(opO-litaral-p-inaar  1  ail)) 
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(d«fTui  opO-«zt«!id«d-lit*Tal-p  (1) 

(epO-litazal-p-ianar  It)) 

(dafna  tkg{«d-lit«ral-p  (op) 

(oq  (ear  op)  :taggod-lit*rol)) 

(dofoa  j-rogiotar-p  (op) 

(or  (oq  (ear  op)  :J-ragiatar) 

(aq  (ear  (translata-eparand  op))  ; j-raglstar))) 

;;  To  distiBgoish  it  tram  dfea  Traaas. 

(dafaii  j-fraaap  (op) 

(aq  (ear  op)  ifraaa)) 

(dafilB  j-ofTaat-p  (op) 

(or  (j-fraaap  op) 

(J-aasaafap  op) 

(j-taaporary  op))) 

(dafu  j-taaporary  (op) 

(aq  (ear  op)  itaaporary)) 

(dafan  j-aassagap  (op) 

(aq  (ear  op)  :BaaaBga)) 

(dafns  labalp  (op) 

(aq  (ear  op)  : label)) 

(dafQa  j-ayabolp  () 

(aq  (ear  op)  :syBbol)) 

(dafan  rafaraaeap  (op) 

(aq  (ear  op)  :raf)) 

(dafaa  bindlagp  (op) 

(aq  (ear  op)  ibisding)) 

:(priat  (oatpat-taggad-litaral  (aaka-taggad-litaral  *(:litaial  iat)))) 

(dafaa  opO-litaral-p-iaaar  (1  aztaadadp) 

(if  (taggad-litaral-p  1) 

(lata  ((tag  (laeoad  1)) 

(aalaa  (if  (aq  tag  iat-tag) 

(aaal  (third  1))  ;  To  allow  as  to  asa  ajabols  instead  of  ints 

(third  1)))) 

(eoad  ((aaabarp  walaa) 

(if  (aaabar  (eoas  tag  walaa)  opO-litarals  :tast  t’aqaal) 
t 

(if  aztaadadp 

(and  (aq  tag  iat-tag) 

(>■  walaa  -64) 

(<w  walaa  63)) 

(and  (aq  tag  iat-tag) 

(>«  walaa  -16) 

(<■  walaa  16))))) 

((labalp  walaa)  ail)  i  Safa  aasaaption 

(t  ail))))) 

(dafaa  opO-oparand-p  (op) 

(opO-oparand-p-iaaar  op  nil)) 

(dafaa  opO-aztendad-oparaad-p  (op) 

(opO-eparand-p-innar  op  t)) 
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(d«fu  opO-opcrud-p-ianar  (op  oztondodp) 

(cead  ((j-rogiatot-p  op) 

(lot*  ((aetiui.  (tranalato-oporuid  op)) 

(Tolao  (soeond  aetnal))) 

(or  (gonl-pazpooo-zog-p  op) 

(oq  valno  >i0)  (oq  Tolao  >il)  (oq  Tolno  ’13)  (oq  oalno  ’13)))) 

((taggod-litoral-p  op) 

(opO-litoral-p-iimar  op  oztondodp)) 

((j-olf«ot-p  op) 

(lot  ((oKoot  ()iuo-offoot  op))) 

(eond  ((noabozp  offaot) 

(it  oztoado^ 

(and  (<  oflaot  63)  (>•  olfaot  0)) 

(and  (<  offaot  16)  (>■  oflaot  0)))) 

( (gonl-pnzpoao-rog-p  offaot) 
t))»)) 

; ;  Eogiator-ozlontod  opO  aodo 
(dofnn  ropO-oporand-p  (op) 

(lot*  ((oporand  (tranalato-oporand  op)) 

(talno  (aaeond  opozand))) 

(and  (j-zaglatar-p  opozand) 

(oz  (gonl-pnrpoao-zog-p  op) 

(aoaboz  zaluo  ’(10  11  13  13  III  IP))))))  ;  Mozo  oziat.  bnt  tboao  only  onos  nsod 

(dofeonatant  gonl-pnzpoao-zoga  ’ (&3  R3  R1  RO)) 

(dofnn  gonl-pnrpoao'zog-p  (opozand) 

(lot  ((op  (tranalato-oporand  opozand))) 

(oz  (bindlngp  op) 

(and  (j-zogiatoz-p  op) 

(aoaboz  (aoeond  op)  gonl-pxirpata-raga))))) 

(dofnn  baale-add  (azgl  kzoat  azga) 

('•’  (if  azgl  1  0) 

(eonnt  t  azga))) 

Cnzzont  zogiatoz  acboao  dno  in  pazt  to  Rato. 

Tbia  ayatoa  ia  atill  pziaitiTo.  Soao  notable  oaiaaiona: 

; ; :  -  It  aigbt  zoload  a  zogiatoz  oith  a  zalno  alzoady  in  it . 

(dofoaz  ofzoo-zogiatoz-liat*) 

(aatq  ofzao-zogiator-liat*  gonl-pnzpoao-zoga) 

(dofrar  ojabola-beiiad-to-raga) 

(aatq  ayabola-bonnd-to-raga  nil) 

(dofnn  zoqnoBt-rogiator-lnnor  () 

(if  (nnll  o/roo-zoglotoz-llato) 

(bj-ozzoz  :fatal  nil  "lo  zogiatora  azailablo  in  zoqnaat'Zogiataz-innoz'') 

(lot*  ((toaip  (zoaoTO  ’RO  tfzoo-rogiatoz-liat*)) 

(zog  (if  (nnll  toap)  ’RO  (ear  toap)))) 

(aotq  tfraa-zogiator-llat*  (roaaoo  zog  •froo-zogiatoz-liat*)) 
zog))) 

(dofnn  zoqnaat-appzopriato-zogiataz  (itoa) 

(if  (ganl-pnrpoao-zog-p  itoa) 

(ay-orzor  .‘fatal  nil  "Rog-rag  aozo  zoqnoatod!'*)) 

(if  (and  (taggod-litozal-p  itoa) 

(not  (opO-ozt*ndod-litaral-p  itoa))) 

(if  (aoabor  ’RO  •fzoo-zogiataz-liat*) 

(progn  (aotf  •fzoo-rogiator-llat*  (zoaooo  ’RO  *froo-zogiatoz-liat*)) 

(aotf  Byabola-bonnd-to-zoga  (cona  (con*  (gonaya  ’zog)  ’RO)  oyabola-bonnd-to-rogs) ) 
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*(:blndiivg  ,(eKar  aTmbols-bouul-to-raga))) 

;;  K  ««  gat  h*r«,  ••  B««d  to  alida  RO  into  anotlior  ragiatar 
(lata  ((rO-pair  (raaaoe  *R0  ajabela-bonnd-to-raga)) 

(naa-zag  (raqnaat-zagiataz-ianar))  ;  Cat  anothar  ragiatar 
(ooz-naaa  (ganajB)))  ;  laaa  to  zatnzs  with  naa  ragiatar 

I  Bait  tlio  aoaa  —  to  a  global?? 

(if  (nail  rO-pair) 

(ay-arror  :fatal  nil  "RO  iarariaat  violatad")) 

(aait'j-inatmctioB  ‘(aora  (:j-ragiatar  RO)  (:j-ragiattr  ,naa-rag))) 

(aatf  (cdz  rO-pair)  aaa-rag) 

(aatq  ty^ola-bonad-to-raga 

(eona  (eona  enz-naaa  *R0) 

ayabola-bonnd-t o-r aga ) ) 

*(:biading  ,enr-naaa))) 

(raqaaat-anj-ragiatar)  j  '4 

(dafnn  raqaaat-any-ragiatar  () 

(lat  ((rag  (zaqaaat-ragiatar-innaz))) 

(aatq  ayabola-boond-to-raga  (eona  (eona  (ganaya  >rag)  rag)  ayBbola-bonsd'to-raga)) 
*(:bindiiig  ,(eaaz  ayabola-bonnd-to-raga)))) 

(dafnn  ratnm-ragiatar  (rag) 

(if  (aq  (ear  rag)  :bindiag) 

(lat  ((pair  (aaaoe  (aaeond  rag)  ayabola-boond-to-raga))) 

(if  (nnll  pair) 

(ay-arror  :fatal  ail  "Illagal  binding  fraad  in  ratnm-ragiataz") 

(lat  ((actual  (edr  pair))) 

(aatf  afraa'ragiatar-liata  (eona  actual  afraa-ragiatar-liat*)) 

(aatq  ayabola-bound-to-raga  (raaora  pair  ayabola-bouad-to-rags))))) 
(ay-arror  :fatal  nil  "Illagal  ragiatar  ratum"))) 

(dafun  binding-to-ragiatar  (ayabol) 

(if  (and  (liatp  ayabol) 

(aq  (ear  ayabol)  ibinding)) 

(lat  ((pair  (aaaoe  (aaeond  ayabol)  ayabola-bound-to-raga))) 

(if  (null  pair) 

(ay-arror  :fatal  nil  "Binding  not  found") 

‘(ij-ragiatar  , (edr  pair)))))) 


Bait  eoaawnda 

; ;  Tbia  "foreaa”  ragiatar  aaaigaanta  aban  tba  coda  ia  aaittad. 

(dafun  tranalata-oparand  (op) 

(if  (liatp  op) 

(eond  ((null  op)  nil) 

((aq  (car  op)  :bindiag)  (binding-to-ragiatar  op)) 

((aq  (ear  op)  :ragiatar)  (tranalata-Tirtual-ragiatar  op)) 
(t  (eona  (tranalata-oparand  (ear  op)) 

(tranalata-oparand  (edr  op))))) 

op)) 

(dafrar  aj-inatruetionaa  nil) 

(aatq  aj-inatruetionaa  nil) 

(dafun  aait-J-inatruetioB  (inat  Rkay  (paaa-tbrough  nil)) 

(lata  ((opeoda  (ear  inat)) 

(oparanda  (if  paaa-tbrougb 
(edr  inat) 

(■apear  t’ tranalata-oparand  (cdz  inat)))) 
(inatmction  (eona  opcode  oparanda))) 

(aatq  aj-inatruetionaa  (append  aj-inatructionaa 

(liat  inatruction))) 

:  For  traea  purpoaaa,  juat  ratum  lataat  naa  inatruction 
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instruct ion)) 


(4sfiin  calt-J-lnstmctlons  (illst) 
(aspenr  t’sait-j-lnstmetion  illst)) 


; ; ;  Rossacnt  rontinss 

(dsfnn  asks-lsgal-noTs  (sonres  dsst) 

(it  (or  (gonl-pnrposo-rog-p  souco) 

(gonl-pnrposo-rog-p  dost)) 

:  It  loBSt  OBO  is  s  register 
(asko-lognl-asTo-sith-reglstor  soorco  dost) 

(lot  ((register  (reqnest-spproprlste-register  source))) 

(■ske-legBl-BOTO  senree  register) 

(■ske-legsl-aeve  register  dost) 

(return-register  register)))) 

:  i :  PossiUe  opersnds  inclnde : 
ii:  (:register  ..) 

;;;  (:j-register  ..) 

(sbindlng  ..) 

(:frue  (:bsse  t)) 

(jtsgged-litersl  •  •) 

(defnn  s«ke-legal-sta*e-sltb-register  (source  dost) 

(if  (genl-pnrpese-reg-p  source) 

(if  (or  (ropO-opersnd-p  dost) 

(opO-ertended-operand-p  dost)) 

(oait-j-instmetion  ‘(nose  , soorco  .dost)) 

;;  If  so  get  hero,  soorco  is  a  register,  hot  dost  is  too  big 
(nake-logal-big-awTo  soorco  dost)) 

(if  (genl-porpose-rog-p  dost) 

(if  (or  (ropO-eporand-p  soorco) 

(opO-oztonded-operand-p  soorco)) 

(enit-j-instroetion  '(noTo  ,soazee  ,dest)) 

(■ake-legal-big-sieTe  soorco  dost))))) 

! ; i  ■ake-legal-blg-aote  called  shea  one  operand  ia  a  register  and  the 

;;;  other  ia  sonethiag  that  can’t  be  represented  in  opO  or  register-oriented 

iii  opO  node,  soch  as  a  big  literal  or  a  fraae  saloe  sith  a  large  offset. 

(defna  nake-logal-big-noso  (soorco  dost) 

(if  (ganl-porpose-reg-p  soorco) 

;;  destination  nost  bo  frane  (or  eqolT.)  (i.a.  can't  bo  literal) 

(late  ((offset  (base-offset  dost)) 

(tagged-offset  (nake-tagged-litoral  offset)) 

(rog  (roqoest-appropriato-register  tagged-offset)) 

(nos-operand  (raplace-offset  rog  dost))) 

(aahs-logal-noso  tagged-offset  rog) 

(■ake-legal-aose  soorco  neo-operand) 

(rotnm-raglstor  rog)) 

;;  If  so  gat  hero,  dost  mst  bo  a  gpr 
(eond  ((tagged-lltsral-p  soorco) 

(if  (epO-literal-p  soorco) 

(aait-j-iastroctioa  (list  ’siooo  soorco  dost)) 

(let  ((actoal-dest  (translate-operand  dost))) 

(if  (equal  actoal-dest  ’(; j-rogiater  KO)) 

(eait-j'iastroctloa  '(de  , soorco)) 

(nossago  :fatal  nil  "hO  not  rosorood  ohon  required"))))) 
(t  (ay-error  :fatal  ail  "Uohandled  case  in  aaka-lagal-big-aoTo"))))) 


(defstroct  code-boadle 
operand-list 
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r«g«-to-b«-fr««d) 

(dcfu  buuUa-xctim-ragictcrs  (bnsdla) 

(aapoan  t’r«tUB-r«glst*r  (oed«-buxdl«-r«ga-to-b«-fr««d  bnadl*)) 

(■•tf  (eod«-bukdl*-r«gs-to-b«-fr*«d  buadla)  ail) 
bnadla) 

(dafaa  conT«rt-ej-to-«j  (cb) 

(upc  f’aaka-ltgal  (datafloa-gTapb-Toot*a«t  eb)) 

Ifti  (datallov-graph-root-sat  eb)  •J-iaatrnetlons*) 
eb) 

(defna  aake-legal  (iaatraetioa) 

(let*  ((opeod*  (ear  iaatruetioa)) 

(operaada  (.if  (*^  opeod*  ’bybrid-laatrnetloa) 

(edr  iaatraetloa) 

(aapeaz  •>aak*-tagg*d-lit*xal-if-approprlat*  (edr  iaotraetion)))) 
(aaa-ops  (loagtb  operaada)) 

(iaatraetloa  (eoaa  opeod*  operaada))) 

(if  (paeado-op-p  opeod*) 

(proe*a*-p**ado-op  opeod*  operaada) 

(eoad  ((*  aaa-opa  0)  ;  Typieallj,  aaipend 

(eait-j-laatraetioa  iaatraetloa)) 

((*  aaa-opa  1)  ;  TTpicallj.  aead  or  breach 

(if  (aq  opeod*  ’br) 

(aake-braneh  opeod*  operaada) 

(aake-iato-fozB  opeod* 

operaada 

(eoaa  t'ert-opO  'aoore*)))) 

;;  Shoalda’t  aoaethiag  for  bxaaehea  be  her*? 

((*  aaa-opa  3)  ;  Tjpieallp  ao**,  aaary  op,  or  bee 

(eoad  ((eqaal  opeod*  ’aote) 

(aake-legal-aoT*  (firat  operaada)  (aeeoad  operaada))) 

((or  (equal  opeod*  ’aeg)  (eqaal  opeod*  >aot)  (eqaal  opeod*  *rtag)) 
(aahe-iato-fora  opeod* 

operaada 

(eoaa  t>*xt-opO  ’aoarce) 

(eoaa  t’gpr  ’deat))) 

((aeaber  opeod*  '(bf  bt  ba  baa)) 

(aaka-braaeh  opeod*  operaada)) 

(t 

(aesaage  ifatal  all  "Illegal  opeod*  ia  aahe-legal")))) 

((a  aaa-opa  3)  ;  Tjpieallj  biaarp  op  (all  hate  aaa*  fomat) 

; i  It  ahoald  try  exchangiag  the  firat  tao  operaada  to  exeeat*  more  cheaply 
(make  -iato-fora  opeod* 

operaada 

(eoaa  t’gpr  ’aoaree) 

(eoaa  ’opO  ’aoarce) 

(eoaa  'gpr  'deat))})))) 

;;  Sob*  eoadltioaal  braaehea  caa’t  be  eacoded  iato  oa*  iaatraetloa;  additioaally ,  ia 
;;  ay  aiapl*  oae-paaa  aaaeabler,  I  eaa’t  deteraia*  diaplaceaeata ,  etc,  Beaee,  all 
;;  jaapa  will  be  eoawertad  la  a  paaaiaiatie  aay,  e.g. 

: ;  ba  tl ,  label! 

; ;  baa  tl ,  aew.label 

; ;  br  label! 

aew.label: 

;;  The  type*  of  braaehea  ara:  bf,  bt,  ba,  bta,  bail,  baail. 

;;  (The  laat  two  area’t  aaed  by  hybrid  ataff  bat  are  ia  for  coapleteneaa. ) 

(defrar  braaeh-oppoaitea  ’((bf  .  bt)  (ba  .  baa)  (baail  .  bail) 

(bt  .  bf)  (baa  .  ba)  (bail  .  baail))) 

(defaa  aake-braaeh  (opeod*  operaada) 
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(If  (*4  opcode  'br)  ;  iboelato  braneb 

(■ako-lo(al  ‘(aoro  , (ear  operands)  (:j-rogistor  ip))> 

(lot  (;(no«-labol  (gonsja  ’jog)) 

(oppoaito-opcodo  (edr  (aaaoe  opcode  branch-oppositoa))) 

(eoadltloa  (first  operands)) 

(original-label  (second  operands))) 

i  Tbo  folleoing  lino  would  give  us  an  infinite  loop! 

:  (aabo-logal  * ( ,opposito-opcodo  , condition  (:labol  ,aoa-labol))) 

;  Instead,  do  a  violation  of  abatraetien: 

(asks -into -fora  oppoaito-opcodo  ‘(.condition  (staggod-litoral  ,int-tag  3)) 

(eons  t’gpr  ’source)  (eons  d’ort-opO  ’source)) 

(aako-logal  *(br  .original-label)) 

(aako-logal  ‘(align)) 

:  (aako-logal  ‘(label  (:litoral  (:sTabol  .now-labol)))) 

))) 

(dofun  roplaeo-offsot  (rog  operand) 

(list  (ear  operand) 

(list 

(eaadr  operand) 
rag))) 

(dofun  gpr  (axg  dir  bundle) 

(if  (ganl-purpcao-rog-p  arg) 

(aako-eode-bnndla 

:eporand-list  (append  (eodo-bundlo-eporand-list  bundle)  (list  arg)) 

:rogs-to-bo-frsad  (eodo-bnndlo-roga-to-bo-froad  bundle)) 

(aako-aowo-witk-rogiator  arg  dir  bundle))) 

(dofun  aaka-aowa-witb-roglatar  (arg  dir  bundle) 

(lot  ((rog  (raquast-appropriata-ragistar  arg))) 

(if  (oq  dir  ’soureo) 

(pregn 

(aako-logal-aewo  arg  rog) 

(aako-codo-bundla 

:oporand-lltt  (append  (eoda-bundlo-oporand-list  bundle)  (list  rog)) 

:rags-te-bo-frood  (append  (eoda-bundlo-rogs-to-ba-frood  bundle)  (list  rog)))) 

::  dost 
(progn 

(aako-logal-aoTo  rog  arg) 

(aako-codo-bundlo 

:eporaad-list  (append  (codo-bundlo-eporand-list  bundle)  (list  rog)) 

:rogo  -to-bo-froad  (append  (eodo-bundlo-rogs-to-ba-frood  bundle)  (list  rog))))))) 

(dafaaero  baaa-taggod-offsot  (a) 

‘ (aako-taggod-litoral  (base-offset  ,a))) 

(dofun  opO  (arg  dir  bundle) 

(if  (opO-eporand-p  arg) 

(■ako-eodo-bundlo 

: operand-list  (append  (eoda-bundlo-oporand-list  bundle)  (list  arg)) 

:rogB-to-bo-fraod  (cods-bundlo-rags-to-bo-frood  bundle)) 

(■aka-big-itsa-into-gpr  arg  dir  bundle))) 

(dofun  ■ako-big-itsa-into-gpr  (arg  dir  bundle) 

;;  Tbaro  are  two  possibilitias; 

(1)  it  ia  a  fraao  roforoneo  that  wo  could  eonwort  (in  which  caaa  direction  is  irrslorant) 

(if  (aq  (ear  arg)  ;fraas) 

(lot*  ((walua  (baso-taggod-offaot  arg)) 

(rag  (roquost-apprepriato-rogistor  value))) 

(aako-logal-aovo  value  rog) 

(sake - e ode -bundle 

;eporand-liat  (append  (eodo-bnadlo-eporand-list  bundle)  (list  (roplaeo-offsot  rog  arg))) 
;roga-te-bo-frosd  (append  (codo-bnndlo-rogs-to-bo-frood  bundle)  (list  rog)))) 
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: ;  (3)  It  mat  b«  atorad  inta  a  aaparata  ragiatar 
(■aka-aata-alth-raglatar  arg  dir  bnndla))) 


(da^na  axt-apO  (axg  dir  bnadla) 

(U  (apO-aztandad-aparand-p  axg) 

(■aka-cada-bnadla 

: aparaod-llat  (appand  (cada-bmidla-eparand-liat  bnndla)  (liat  arg)) 

;ragB-ta-ba-fraad  (cada-bnndla-raga-ta-ba-fraad  bnndla)) 

(naka-blg-itaa-iata-gpr  axg  dir  bnndla))) 

(dafnn  gnarantaad-ak  (arg  dir  bnndla) 
bnndla) 

(dafnn  praeaaa-nparand-if-Baurea  (aparanda  pattama  eannt  bnndla) 

(U  (>•  eannt  (laagtb  aparanda)) 
bnndla 

(lat  ((ap  (nth  oannt  aparanda)) 

(pat  (nth  eannt  pattama))) 

(li  (aq  (cdr  pat)  >aenrea) 

(apply  (ear  pat)  (liat  ap  'aanrea  bnndla)) 
bnndla)))) 

(dafnn  Bynbnl>  (x  y) 

(Btring>  (atrlng  x)  (atring  y))) 

(dafnn  praeaaa-aparaad-if-daat  (ap  pat  bnndla) 

Praeaaa  anly  if  daatinatien 
(if  (naq  (edx  pat)  ’daat) 
nil 
(liat 

(if  (gaal-pnxpaaa-rag-p  ap) 
ap 

(lat  ((rag  (raqnaat-anyragiatar))) 

(aatf  (eada-bnndla-raga-ta-ba-fraad  bnndla) 

(eana  rag  (eeda>bnndla-xaga-ta-ba-fraad  bnndla))) 
rag))))) 

: ;  Unfartnnataly,  it  aaana  aa  have  ta  eeda  in  aana  apaeifica  to  haap 
;;  the  eada  fran  baing  taa  eonplax.  The  aaannptiona  ara: 
i  $ 

■  •  -  in  inatmetian  haa  np  ta  taa  aanreaa. 

I  » 

; ;  -  Tha  laat  aparand  ia  tha  anly  ana  that  can  ba  a  daatination. 

;;  If  it  ia  a  daatinatian,  it  ia  alaa  a  gpr  (axeapt  for  noaaa, 

;;  ahieh  ara  handlad  apaeially). 

(dafnn  naka-inta-foxai  (apeada  aparanda  bcaat  pattam) 

;  7irat,  cheek  that  aana  f  af  aparanda  aa  pattama 
(if  (/a  (length  aparanda) 

(length  pattern)) 

(ny-arrar  .'fatal  nil  "lot  enangh  aparanda  for  pattam") 

;  Oanarata  tha  eada  far  np  ta  taa  aanreaa  and  np  ta  ona  daat 

(lata  ((atap-ana  (praeaaB-aparand- if -aanrea  aparanda  pattam  0  (naka-eoda-bnndla))) 

(bnndla  (praeaaa-eparand-if-aanrea  oparanda  pattam  1  atap-ana)) 

(daat-rag  (praeaaa-opaxand-if-deat  (ear  (laat  aparanda)) 

(car  (laat  pattern)) 

bnndla)))  ;!  Bnndla  mntatad  ! 

;  Emit  inatmetian 

(aait-j -inatmetian  (eana  apeada  (append  (eada-bnndla-oparand-liat  bnndla) 

daat-rag))) 

;  Bait  tha  eeda  (if  any)  ta  pnt  raanlt  inta  daatination 
(if  (and  daat-rag 

(not  (aqnal  daat-rag  (laat  aparanda))))  ;  aq  and  aql  too  strong  for  lists 

(swka-lagal-aoTS  (ear  dast-rag)  (car  (last  operands)))) 
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i  Fr««  raglstara 

(mapeax  t’Tatim-xagiatar  (eoda-bnndla-raga-to-ba-fraad  bnsdla))})) 


Psaado-op  fanetiona,  fox  ;xaaaxaa  and  :fxa«,  :aaapoaai*a*,  and  :labal 

;;;  -  (ixaaaxra  <a7nbol>}  and  (:fraa  <a7abal>)  ax#  aaad  to  bind  tba  aalaa 
;;;  of  tba  ajabol  ao  that  (iraglatax  <a7nbol>)  la  naaningfal.  Tba 
aaaga  la  of  tba  fora: 
i;;  (:xaaarTa  (ixagiatax  acrateb)) 

iS;  (:aoTa  (:J-ragiatax  13)  (iraglatax  aexatcb)) 

lit 

i::  (:fraa  (tragiatax  aexatcb)) 

;;;  Tba  aaaga  la  paxpoaaly  aarboaai  to  aUoa  a  ebanga  of  xapraaantation, 

; : ;  aa  wall  aa  arror-ebacbing.  (kaaaxxing  a  aacond  xaglatar  of  tba  aaaa 

aaaa,  nalag  a  noaxaaaxtad  xaglatar,  and  fxaaing  a  nonxaaaxTad  xaglatar 
;;;  axa  all  axxoxa.) 

(dafxax  avixtaal-xagiataxaa) 

(dafxax  apaando-op-liata) 

(dafna  paando-op-p  (op) 

(aaaoe  op  apaondo-op-llat*) ) 

(dafna  pxoeaaa-paando-op  (opcoda  opaxanda) 

(applj  (cdx  (aaaoe  opcode  apaando-op-llata))  (Hat  opaxanda))) 

(dafna  rnaarTa-Tlxtncl-raglatax  (opaxanda) 

(lata  ((operand  (fizat  (ear  opaxanda))) 

(na>a  (aacond  (ear  opaxanda)))) 

(if  (naq  opaxand  iraglatax) 

(■yaxxox  ifatal  ail  "Illegal  ixaaaxaa  ayntax") 

;  Cbaek  if  it ’a  alxaady  allocated 
(If  (aaaoe  aaaa  *Tlxtnal-xaglataxaa) 

(■y  -axxox  ifatal  ail  "la  attaaqjit  aaa  aada  to  xa-alloeata  a  rixtnal  ragiatax”) 

(let  ((rag  (xaqnaat-any-ragiatax)))  ;  Tbia  la  a  TEXPORIRY  Baasnra  —  it  might  aaad  RO 
(aatq  aTirtnal-ragiataxa* 

(eena  (eona  naaa  xeg) 

•xirtnal-xagiataxa*) )))) ) 

nU) 

(dafna  fxaa-vlxtnal-xagiatax  (opaxanda) 

(let  ((opaxand  (fixat  (car  opaxanda))) 

(aaaa  (aacond  (car  opaxanda)))) 

(if  (naq  opaxand  iraglatax) 

(ay-arxor  ifatal  ail  "Illagal  ifxaa  ayntaz") 

;  Cbaek  If  it ’a  alxaady  allocated 
(if  (aaaoe  naaa  *Tirtnal-xagiataxaa) 

(pxoga 

(xatnxa-xaglatax  (cdx  (aaaoe  naaa  aTlrtnal-xagiataxa*))) 

(aatq  axixtnal-ragiatara* 

(xaaioTa  (aaaoe  naaa  aTirtnal-xagiataxa*) 
oTixtnal-xagiatexa*))) 

(ay-axxox  ifatal  nil  "In  attempt  aaa  made  to  free  aa  nnallocatad  virtual  xaglatar")))) 

all) 

;;  Tbaxa  axa  tao  tbinga  tbia  eonld  be  called  foxi 
;;  (ixaglatex  <naaa>) 

i  i  ox 

;;  (iraglatax  •) 

; ;  Tba  aaaainga  axa  very  diff axant .  Tba  fixat  aaa  a  taapoxaxy  aaaignad  by  (my)  hybrid-to-cj . 

;;  Tba  latter  aaa  a  tampexaxy  aaaignad  by  lannneci'a  ganarata-and-inatruction.  Botb  map  to 
;;  tba  aaaa  thing  boaaTox.  For  aoa,  naa  Cl,10]  for  tba  lattax.  Inafficiant,  but  correct. 
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ti  I  lapluicat  tlila  ia  hybrid-«o-ej  bat  dtaerlb*  it  bar*,  bacaaa*  tb*  raal  fix  should  b*  bar*. 
; ;  i  solutloa  I  eoasidarad  but  sbieb  is  lOT  iaplaaaatad: 

; ;  Baeuus*  rasart*  k  fra*  ax*  aaittad  for  tb*  first  typ*  sad  aoaittad* 

;;  for  tb*  sacoad,  s*  bat*  to  assua*  a  littla:  k  first  accass  is  aa  inplicit  rasazva,  and  a 
;;  sacoad  is  aa  iaplicit  fra*.  This  aatebaa  boa  laaaucci  nsas  ragistars  (I  tbiab!)  for 
; ;  aoa-loop-satup . 

(dafna  translata-tirtual-ragistar  (raf) 

(lat  ((aaa*  (sacoad  raf)}) 

(traaslata-oparaad  (cdr  (assoc  aaa*  sairtual-xagistars*))))) 

;;  Origiaatiag  J-aacbias  cod*  bar*  might  b*  soacthiag  of  a  tiolatioa  of  abstractioa. 

(daftar  auspaasita-biadiag) 

(dafua  aak*-susp*asi**-iaatroctioa-eod*  (duaay) 

(lat*  ((aaa*  (gaasya  ’auspaaait*)) 

(label  ‘(:lit*ral  (: symbol  ,Baa*))) 

(raf  (aaba-taggad-litaral  *(:r*f  ,Baa*))>) 

(aaka-lagal  * (label  .label)) 

(satq  suspaasita-biadiag  (raquast-appropriata-ragiatat  raf))  ;  RO 
(aaka-lagal  '(mot*  (laassag*  (:bas*  1))  (:j-r*gist*r  A2))) 

(aaka-lagal -aoT*  raf  suspaasita-biadiag))) 

;  (aliga) 

;  (aaka-lagal  ' (aot*  (:j-r*gist*r  ip)  (saassag*  (:bas*  0)))) 

;  (aaka-lagal  '(aot*  (:B*ssag*  (;baas  1))  (:j-r*gist*r  13)))) 

This  is  iasfficiaat. 

(dafua  aaka-prasaaca-chaek  (oparaads) 

(lat  ((op  (car  oparaads)) 

(rag  (raquast-aay-ragistar))) 

(aaka-lagal  ‘(rtag  ,op  ,r*g)) 

(ratuzB-ragiatar  rag))) 

(dafua  aad-suspaasita-pazt  (duany) 

(ratura-ragistar  saspaasita-b lading) 

(satq  suspaasita-bindlag  nil)) 

(dafua  baadla-labal  (oparaads) 

(aait-j-iastruetioB  (list  ’labal  (car  oparaads)))) 

(dafua  pass-tbrougb-hybrid-iastructioa  (oparaads) 

(emit- j-iastmetion  '(hybrid-instruction  .Coparands)  :paBB-throBgh  t)) 

(satq  apsaudo-op-list*  (list  (cons  ’rasart*  k’rasarta-tirtual-ragistar) 

(cons  'fro*  t’fraa-tirtual-ragistar) 

(cons  ’susponsito-iastmetioa  >*Bak*-su8p*nsit*-iastruction-cod*) 
(cobs  ’suspansita-oparand  •’aak*-pr*s*ac*-ch*ck) 

(cons  'suapansita-chack-don*  •’and-suspansita-part) 

(cons  ’label  f ’handla-labal) 

(cons  ’ hybrid- instruct ioB  •’pass-througb-bybrid-iastruction))) 
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C.3  Convert  Simple  J  to  Assembly 


;;;  Hod* : CoMBoa-Llsp ;  PockBg«:IO-CaKPILSk;  Bo**:10 

;;;;  •j-te-an.liop  eosTorto  MSP  eodo  froa  s-oxproosiono  into  fonat  ouitablo  <or  MDPSia. 
;;;;  -  Coavort  fxoa  ■-ozpxoosloao  to  ■txiaga, 

;;;;  ahieh  lacladoo  pattiag  la  eonuo  aad  aovliaoo 

;;;;  -  koplaeiag  ehaxaetoxa  Ilk*  and  aitk  to 

>Bk«  logoi  MSP  id*atlfi*xa. 

-  Baadlas  x*f*x*ae*ai  labels .  aad  apabols. 

(in-paekaH*  *id-e*Bpil*x) 

(d*feo^ll*x-a*dnl*  e*aT*xt-s*zp-j-t*-asa  ld-eaiq>ll*x 
(siapat  *ad-iastxnctl*as  e*d*-bl*ek)  ;  A  11* 

(:b*f*x*-faactl*a  pxoeadax*  x*s*«-aj~t*-asa-spst*a) 

;  (:*pti*BS  Tad-*atpat-f 11*) 

(:<BB*tl*a  e*av*rt-sj-**-asa}) 

(dafaacx*  cat  (kxast  axgs) 

* (coneat aaat*  ’stxlag  .Caxga)) 

(d*<Tax  •oatpat-stxing*) 

(dafvax  *a*H-x*<-li*t*) 

(d*<tax  •ip-x*f-li*t*) 

(d*fvax  *p*xaad-llst) 

(d*^na  x*s*t-s j-to-asa -sjstsa  () 

(s*tq  •ip-x*f-list«  nil) 

(**tq  *asg-x«f-llst«  ail) 

(s«tq  ‘ontpat-atxiag*  "")) 

(d*Ain  aako-j-atxia;  (oya) 

(l*t  ((s  (e*p7-a*q  (ay-stxlng  sya)))) 

(aak*-j-stxlag-iaa*x  a  0) 
a)) 

(doinn  aak«-J~stxiag-iaa*x  (a  ind*x) 

(if  (<  lnd*x  (langth  a)) 

(l*t  ((c  (ckax  a  iBd*x))} 

(if  (ox  (oql  e  AN:) 

(oql  e  t\-)) 

(sotf  (ckax  a  index)  AN.)) 

(aak*-j-stxiaH-ian*x  a  (!'•'  index))))) 

(dofaa  asa-oatpat-opcod*  (opeodo) 

(sotq  opoxand-list  nil) 

(if  opeodo 

(sotq  ooatpat-stxiag*  (eat  ooatpat-stxing*  (foxaat  ail  x'X'T'A'T*'  (stxing  opeodo)))) 
(sotq  ooatpat-atxlngo  (eat  ooatpat-stxing*  (foxaat  ail  "'X''))})) 

(dofaa  asa-oatpat -label  (1) 

(:oa-OBtpnt-opcodo  nil) 

(asa-ontpat-opoxand  (eat  (aako-j-stxing  (second  (tkixd  1)))  ":")) 

(asa-oatpat-ond-lino) ) 

(dofaa  asa-oatpnt-align  () 

(asa-oatpnt-opeodo  nil) 

(aaa-oatpat-opoxand  ":“) 

(asa-oatpat-oad-liao) ) 

(dofaa  asa-ontpnt-eoaaiont  (text) 
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(••«q  •ontpat-atriag*  (cat  •outpat-ctriag*  (format  ail  ‘**X*X:'S”  tort)))) 

(dafu  aaa-oatpat-oparand  (operand) 

(aatq  oparand-Iiat  (ncone  oporand-liat  (liat  operand)))) 

(defoB  aam-ontpnt-end-line  () 

(aaa-ontpnt- end-line -inner  (length  operand-liat)  operand-liat) ) 

(defnn  aam-ontpnt-end-line-inner  (len  opa) 

(if  (>  lea  0) 

(progn 

(aatq  eontpat-stringe  (eat  eoatpat-atring*  (first  ops))) 

(if  (>  lea  1) 

(setq  eoatpat-string*  (cat  eoatpat-atring*  ",  "))) 

(aaa-oatpat-end-liae-iaaer  (-  len  1)  (edr  ops))))) 

(defear  eearrant-fraas-deaeriptor*) 

(defaa  contart-sj-to-aaa  (eb) 

(let  ((name  (dataflos-graph-get  cb  :proeadara-naae)) 

(iastraetioaa  (dataflow-graph-root-set  cb))) 

;  Tack:  Do  this  right.  On  second  thoaght,  don’t  bother. 

(aatq  eeazrent-fraae-dateriptor*  (dataflos-graph-get  cb  :frane-d*scriptor) ) 

(nape  t’coasart-sj-iastraction-to-asa  instraetiona) 

(let  ((filename  (open  (aaka-pathname  :t7pa  "KDP" 

idafaolta  (cat  "o;>ellent>''  (string  name))) 

: direction  :ontpnt))) 

(priae  eoatpat-string*) 
i  Oatpat  the  nodal* 

(format  filename  "modal*  *a*l"  name) 

(priae  eoatpat-atring*  filename) 

(format  filename  "'Xend'X") 

;  Oatpat  the  references 

(loop  for  ref  in  (set-difference  *Bsg-r*f-list*  ’(local.moer  local. gate)) 

doing  (format  filename  "ref  'a.msg.ref  ■  HSC: (((•*+*a_loe)<<'  )+2'X" 

(make-j-string  ref) 

(make-j-striag  ref) 

name 

**p*-l*n-bit**)) 

(loop  for  label  in  *ip-r*f-list* 

doing  (format  filename  "ref  'a.ip.raf  «  IP:  ((Ca+'a.loe)  <<*d)  )+ABS0HrrE'%" 
(mak*-j-string  label) 

(make-j-string  label) 

name 

esfs-lea-bits*)) 

;  Bogas  for  loops 

(format  filename  "ref  'a.codeblock.ref  •  CB:  ('*.loc«16)+*D'X" 

(dataflos-graph-get  eb  :proc*dar*-nam*) 

(dataflos-graph-get  eb  :proc*dar*-nam*) 

(frame-deseriptor-next-aTailable-scrateh-slot  ecarrent -frame -descript or*) ) 
(close  filename)))) 

(defan  eonsert-sj-iastraction-to-aam  (instraction) 

(1st  ((operator  (car  iastraction))) 

(eond  ((*q  operator  ’label)  ;  special  cases 

(asm-oatpat -label  (cadr  instraction))) 

((*q  operator  ’align) 

(asm-oatpat-align)) 

((*q  operator  'hjbrid-instraction) 

(begia-hpbrid -iastraetion-eonTersion  (edr  instraction))) 

(t 

(asm-oatpat-opcod*  operator) 

(nape  •’eonsert-sj-operand-to-aam  (edr  instraction)) 

( asm- oatpat - end-1 in* ) ) ) ) ) 
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(dafna  b*((ia-liybTid-liuitmetio&-eoiiVfirsloii  (text) 

(mn-ontpnt-eoaBe&t  text)) 

(defun  eoaT«xt-sj-op«ruul-to-aaa  (operand) 

( asa- output- ope  r  and 
(ease  (ear  operand) 

((:tagged-literal)  (output -tagged-literal  operand)) 

(( : j-register)  (ay-string  (second  operand))) 

((:fraae)  (foraat  ail  “C'Siil)"  (eadadr  operand))) 

((:aessage)  (foraat  all  "C*S|i3]"  (eadadr  operand))) 

((iteaporary)  (format  all  '*[*3,10]”  (eadadr  operand))) 

))) 

(defun  output-tagged-literal  (operand) 

(let  ((tag  (second  operand))) 

(if  (aq  tag  special-tag) 

;  BTarythlng  as  lETs  not  labels  (labels  sould  be  acre  appropriate  for  branches) 

(coad  ((eq  (ear  (third  operand))  :eode-block) 

:  It  goes  sithout  saying  that  the  code-block  ref  sill  be  output 
(oat  "{"  (aake-j'Striag  (aeeond  (third  operand)))  ".eodeblock.ref}'")) 

((eq  (ear  (third  operand))  :ref) 

(aatq  *asg-ref-list*  (rsaote-duplicatss  (eons  (second  (third  operand)) 

•asg-ref -list ») ) ) 

(eat  (aake- j-string  (second  (third  operand)))  "_Bsg_ref>“)) 

((eq  (ear  (third  operand))  :labal) 

(setq  eip-ref-list*  (reasse-duplicates  (cons  (second  (third  operand))  eip-ref-list*) } ) 
(eat  (make- j -string  (second  (third  operand)))  ".ip.ref}"} ) 

(t 

(break))) 

(cond  ((eq  tag  int-tag) 

(foraat  ail  (third  operand))) 

((and  (eq  tag  boolean-tag)  (nuaberp  (third  operand))) 

(if  (>  0  (third  operand)) 

"false" 

"true")) 

(t 

(foraat  ail  (string  tag)  (third  operand))))))) 

(defnn  ay-string  (x) 

(if  (nuaberp  x) 

(foraat  all  "'D"  x) 

(string  x))) 
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